InternalYears of multi-vertical ad data

Industry paid-ad ML model

Years of paid-ad data across 165+ accounts in paving, garage door, and Heaviside-managed verticals. Warehoused, vertical-labeled at ingest, ML feature view per account-day, production scoring engine that writes risk scores and threshold alerts on every warehouse run. Same methodology as the anonymized B2B forecasting work, applied to my own ad data.

1,113 ads scanned once is a snapshot. Years of multi-account, multi-vertical, multi-provider data with an ML scoring engine on top is a moat.

Program card

typeYears of multi-vertical ad data

statusInternal

page/ml-industry-ad-model

Why this exists

Google Ads UI and Meta Ads Manager show you last week. They don't show you a paving account quietly deteriorating against its vertical benchmark, or an account whose tracking has been broken since 2023 and is still spending. They don't compare across verticals. The warehouse does.

What's in the warehouse

Daily metrics per account. Vertical labels assigned at ingest from MCC ancestry → client industry → heuristic. Lag and rolling-window features per account-day in a stable ML feature view. Baseline scoring per warehouse run, writing risk scores and threshold-driven alerts. An optional BigQuery sink for heavier downstream modeling.

Same methodology as the anonymized B2B work

This is the throughline. The methodology from the anonymized B2B forecasting program is the same stack I use on my own ad data. XGBoost on engineered features. Lag and rolling windows. Walk-forward validation. Drift monitoring. Quarterly retrain runbooks. The technology transferred. The only differences are the column names and the platform quirks.

Vertical labels at ingest so paving, garage door, and Heaviside data can be compared and benchmarked independently
Resumable backfill with per-account checkpoints (respects Meta's 37-month lookback and Google's daily quota)
Production hardening: Google invalid_grant auto-quarantine, Meta token-expired handling, queue depth metrics, alerting on worker-down and nightly-sync-absence
Three analytics APIs (`/api/analytics/quality`, `/api/analytics/benchmarks`, `/api/analytics/scores`) make the warehouse useful for decisions, not just reporting.

Why this beats '1,113 ads scanned'

The competitor scanner is a one-time research artifact. This is years of multi-account, multi-vertical, multi-provider data with an ML scoring engine on top, continuously ingesting under production constraints. Both have a place. This is the bigger story.