Anonymized Case StudyAnonymized case study

Revenue forecasting (anonymized B2B)

Multi-year revenue forecasting program for an anonymized B2B manufacturer. XGBoost on 27 engineered features, walk-forward validation, state disaggregation that cut MAPE in half, autoresearch agent campaign that drove global mean MAPE to 3.77%.

Most of the win wasn't model architecture. It was picking the right level of detail to forecast at, and letting an autonomous research agent run 369 experiments while I slept.

Program card

typeAnonymized case study

statusAnonymized Case Study

page/forecasting

The problem

Leadership at an anonymized B2B manufacturer needed monthly and quarterly revenue forecasts they could plan on, at the segment and territory level. The existing approach was a smoothed average of last year. Worked fine until the year stopped looking like last year. Then naive baselines (previous period, same period last year) outperformed it.

What I built

Three model families on shared feature engineering. Pooled segment forecasts: two domestic and one international across 144 countries. A US state-disaggregated model that cut error in half. Per-product XGBoost for the 9 product lines covering 97% of revenue.

The methodology that actually moved the numbers

These are the non-obvious moves. Each one is the kind of decision a portfolio post can't usually show, because most portfolios summarize results, not the moves that produced them.

Aggregation level is the real lever. Region × Product × Country × Week was 23% MAPE. Region × Week dropped to 14%. Same data, same model, different granularity.
State disaggregation: 47 weakly-correlated state forecasts summed to the US level beat the US-total model 9.04% vs 18.80%. Error correlation rho = 0.089.
27 engineered features in 7 categories: calendar, cyclical sin/cos (so December and January are adjacent), lag at 1/2/3/6/12/24 months, rolling MA/STD, volatility, trend, domain calendar flags.
Target encoding with smoothing=10 on high-cardinality columns. K-fold inside the training set to prevent target leakage.
Walk-forward temporal split. 2020 explicitly excluded from training as distribution shift.
Naive baselines are the floor. Naive1 was 2,347% weekly MAPE. If you don't beat that, you have a science project, not a model.

The autoresearch story

March 2026. An autonomous Claude Code agent ran 369 experiments over 10 days. Global mean MAPE dropped from ~7% to 3.77%. Seven named discoveries: the data-leakage trap (contemporaneous features looked great until I checked), top-k=20 ensemble averaging, a unified domestic model (the two domestic segments pooled), a global model across all segments, heterogeneous XGBoost + CatBoost 90/10 stacking, a domestic-only seasonal index (sm=15), and `Hist_YoY_Ratio`, an expanding-mean year-over-year ratio feature that turned out to be one of the strongest single features in the stack.