No matter your particular focus, you’ve likely had a data science model fail in production despite very positive results up to that point. But why is this?
Most data science models don’t fail because the math is wrong. They fail because reality is.
What Usually Breaks
Often in data science, the training data looks clean, balanced, and well-behaved. Production data is none of those things. Schemas drift, upstream systems change, users behave differently, and edge cases become the norm instead of the exception.
What Gets Underestimated
The model is often the smallest part of the system. Data pipelines, feature freshness, monitoring, retraining cadence, and human overrides matter more day-to-day than another point of AUC (Area Under the ROC Curve). If those pieces are brittle, the model will quietly decay.
What Teams Miss
Ownership doesn’t end at deployment. Without clear accountability for performance in production, models slowly turn into expensive technical debt.
The Takeaway.
Before tuning hyperparameters, stress-test your assumptions. Ask how the model will fail, how you’ll detect it, and who is responsible when it does.
Curious to hear from you: what was the real reason your last model struggled — or succeeded — in production?
Thanks,
Tom Myers
P.S. Also, please connect with DIH on LinkedIn.