Production ML for an automotive intelligence platform.
- Sector
Automotive / mobility data - Period
2022 — 2023 - Disciplines
Prediction systems
Data engineering
Evaluation infra
The problem
The client operated an automotive intelligence platform whose value depended on the predictive accuracy of a small number of high-leverage models. Those models had grown organically. They were profitable. They were also brittle in ways the team could see in production but could not reliably reproduce in development.
The work
We worked across the prediction stack: feature pipelines, training infrastructure, model architecture, and the evaluation harness that decided when a model was ready for release. The first deliverable was not a new model. It was an honest account of what the existing ones were doing and where they failed.
From that baseline, we rebuilt the training pipeline so that runs were reproducible by anyone on the team, then introduced an evaluation harness whose results the engineers actually trusted. Subsequent architecture work was straightforward once those two pieces were in place.
What shipped
- A reproducible training pipeline with versioned data and full provenance
- An evaluation harness tied to the platform's actual production decisions
- Replacement models for the highest-leverage prediction tasks, with measurable improvements over the incumbents
- Documentation written for the engineers inheriting the work, not for the engagement
What we learned
The single highest-leverage intervention was the evaluation harness. It converted a category of arguments — is this version actually better? — into a category of measurements. The architecture work that followed was easier to do well because the team had a way to disagree with us on the basis of evidence.