Customer + prospect intelligence
Customer segmentation, prospect lookalike scoring, repurchase prediction, and customer-potential scoring across ~34K customers and a six-figure external-institution reference universe. Surfaced an eight-figure identified growth opportunity across the top-scoring accounts.
What Customer + prospect intelligence proves.
The data problem behind Customer + prospect intelligence and the decision the numbers made easier. Where I can share the figures I do. Where I can't, I say so.
Here's a summary of this page by my AI Avatar
Where this sits
Same anonymized B2B manufacturer as the forecasting program. After the forecasting work landed, the next question was: which existing customers should the sales team double down on, and which non-customer institutions are most likely to convert?
The segmentation layer
K-Means on 10 features: RFM plus Avg Order Value, Tenure, Gross Margin Ratio, Items per Order, Purchase Regularity, Product Lines, Unique Items. k=5 clusters, silhouette 0.54. Then a rule-based naming layer mapped clusters to 7 named segments: Champions, Loyal, At Risk, Dormant, New, Developing, Occasional. The top segment is roughly 10% of customers and well over two-thirds of revenue.
The prospect intelligence stack
On top of segmentation, four more models stack to answer different sales questions.
- External-institution fuzzy match: 6-metric ensemble (Token Set Ratio, Token Sort Ratio, Jaro-Winkler, Jaccard, Partial Ratio, Levenshtein) → 73.2% high-confidence match rate
- Lookalike propensity classifier: XGBoost, Test ROC-AUC 0.869, scores 108K+ non-customer institutions
- Customer potential scoring: peer-benchmarked against same-segment top performers, surfaces an eight-figure growth opportunity across the top-scoring accounts
- Repurchase prediction: bimodal Major vs Maintenance order modeling, quarterly purchase probabilities and expected dollar amounts
The archetype layer (above segmentation)
Behavior-based account archetypes layered over RFM: Core Expansion, Programmatic Repeat, Steady Mid-Market, Early Lifecycle, others. Calibrated transfer of archetype patterns across domestic segments via isotonic calibration cut calibration error by an order of magnitude. Top-1 archetype accuracy ~57%, top-2 above 80%. Sales gets a ranked list with archetypes that match how their existing customers actually behave, not a generic 'high score / low score' list.