Machine Learning in Production: What Nobody Tells You
Getting an ML model to 90% accuracy in a notebook is exciting. Deploying it reliably in production, at scale, for real users — that's where the real challenge begins.
The Gap Between Research and Production
Most discussions about machine learning focus on model training: datasets, architectures, accuracy metrics. But seasoned ML engineers know that the model is only about 10% of the work. The other 90% is everything that surrounds it in production.
The Hidden Challenges
1. Data Distribution Shift
Your model was trained on last year's data. The world changes. Customer behaviour, language patterns, market conditions — all of it drifts. Without monitoring, your model silently degrades until someone notices the business impact.
Solution: Implement continuous data monitoring and scheduled retraining pipelines from day one.
2. Latency vs. Accuracy Trade-offs
The state-of-the-art model that takes 800ms to return a prediction is useless for a real-time recommendation engine. Production ML requires deliberate decisions about model size, quantisation, caching, and batching.
3. Explainability for Business Users
A "black box" that achieves 94% accuracy will be ignored by decision-makers who can't understand why it made a specific recommendation. Invest in explainability tooling (SHAP, LIME) from the start.
4. The Cold Start Problem
New users, new products, new markets — every ML system struggles with sparse data. Design fallback strategies before you need them.
Our Production ML Stack
- Model Serving: TorchServe / FastAPI with horizontal scaling
- Monitoring: Evidently AI for data drift, Prometheus for infrastructure
- Experiment Tracking: MLflow for reproducibility
- Feature Store: Centralised, versioned feature definitions shared across models
The Business Bottom Line
The organisations that win with ML are not those that build the most sophisticated models in isolation — they're the ones that build robust, observable, maintainable ML systems that improve continuously. That requires engineering discipline alongside data science brilliance.