Machine learning
Anonymized Case StudyAnonymized case study

Customer + prospect intelligence

Customer segmentation, prospect lookalike scoring, repurchase prediction, and customer-potential scoring across ~34K customers and a six-figure external-institution reference universe. Surfaced an eight-figure identified growth opportunity across the top-scoring accounts.

RFM is unfashionable. It also works, especially when you stack a lookalike model on top to score the six-figure universe of institutions that aren't customers yet.
Program card
typeAnonymized case study
statusAnonymized Case Study
page/customer-intelligence
AI avatar summary

What Customer + prospect intelligence proves.

The data problem behind Customer + prospect intelligence and the decision the numbers made easier. Where I can share the figures I do. Where I can't, I say so.

Here's a summary of this page by my AI Avatar

1

Where this sits

Same anonymized B2B manufacturer as the forecasting program. After the forecasting work landed, the next question was: which existing customers should the sales team double down on, and which non-customer institutions are most likely to convert?

2

The segmentation layer

K-Means on 10 features: RFM plus Avg Order Value, Tenure, Gross Margin Ratio, Items per Order, Purchase Regularity, Product Lines, Unique Items. k=5 clusters, silhouette 0.54. Then a rule-based naming layer mapped clusters to 7 named segments: Champions, Loyal, At Risk, Dormant, New, Developing, Occasional. The top segment is roughly 10% of customers and well over two-thirds of revenue.

3

The prospect intelligence stack

On top of segmentation, four more models stack to answer different sales questions.

  • External-institution fuzzy match: 6-metric ensemble (Token Set Ratio, Token Sort Ratio, Jaro-Winkler, Jaccard, Partial Ratio, Levenshtein) → 73.2% high-confidence match rate
  • Lookalike propensity classifier: XGBoost, Test ROC-AUC 0.869, scores 108K+ non-customer institutions
  • Customer potential scoring: peer-benchmarked against same-segment top performers, surfaces an eight-figure growth opportunity across the top-scoring accounts
  • Repurchase prediction: bimodal Major vs Maintenance order modeling, quarterly purchase probabilities and expected dollar amounts
4

The archetype layer (above segmentation)

Behavior-based account archetypes layered over RFM: Core Expansion, Programmatic Repeat, Steady Mid-Market, Early Lifecycle, others. Calibrated transfer of archetype patterns across domestic segments via isotonic calibration cut calibration error by an order of magnitude. Top-1 archetype accuracy ~57%, top-2 above 80%. Sales gets a ranked list with archetypes that match how their existing customers actually behave, not a generic 'high score / low score' list.