Revenue forecasting (anonymized B2B)
Multi-year revenue forecasting program for an anonymized B2B manufacturer. XGBoost on 27 engineered features, walk-forward validation, state disaggregation that cut MAPE in half, autoresearch agent campaign that drove global mean MAPE to 3.77%.
What Revenue forecasting (anonymized B2B) proves.
The data problem behind Revenue forecasting (anonymized B2B) and the decision the numbers made easier. Where I can share the figures I do. Where I can't, I say so.
Here's a summary of this page by my AI Avatar
The problem
Leadership at an anonymized B2B manufacturer needed monthly and quarterly revenue forecasts they could plan on, at the segment and territory level. The existing approach was a smoothed average of last year. Worked fine until the year stopped looking like last year. Then naive baselines (previous period, same period last year) outperformed it.
What I built
Three model families on shared feature engineering. Pooled segment forecasts: two domestic and one international across 144 countries. A US state-disaggregated model that cut error in half. Per-product XGBoost for the 9 product lines covering 97% of revenue.
The methodology that actually moved the numbers
These are the non-obvious moves. Each one is the kind of decision a portfolio post can't usually show, because most portfolios summarize results, not the moves that produced them.
- Aggregation level is the real lever. Region × Product × Country × Week was 23% MAPE. Region × Week dropped to 14%. Same data, same model, different granularity.
- State disaggregation: 47 weakly-correlated state forecasts summed to the US level beat the US-total model 9.04% vs 18.80%. Error correlation rho = 0.089.
- 27 engineered features in 7 categories: calendar, cyclical sin/cos (so December and January are adjacent), lag at 1/2/3/6/12/24 months, rolling MA/STD, volatility, trend, domain calendar flags.
- Target encoding with smoothing=10 on high-cardinality columns. K-fold inside the training set to prevent target leakage.
- Walk-forward temporal split. 2020 explicitly excluded from training as distribution shift.
- Naive baselines are the floor. Naive1 was 2,347% weekly MAPE. If you don't beat that, you have a science project, not a model.
The autoresearch story
March 2026. An autonomous Claude Code agent ran 369 experiments over 10 days. Global mean MAPE dropped from ~7% to 3.77%. Seven named discoveries: the data-leakage trap (contemporaneous features looked great until I checked), top-k=20 ensemble averaging, a unified domestic model (the two domestic segments pooled), a global model across all segments, heterogeneous XGBoost + CatBoost 90/10 stacking, a domestic-only seasonal index (sm=15), and `Hist_YoY_Ratio`, an expanding-mean year-over-year ratio feature that turned out to be one of the strongest single features in the stack.