Production ML for automotive intelligence

The problem

The client operated an automotive intelligence platform whose value depended on the predictive accuracy of a small number of high-leverage models. Those models had grown organically. They were profitable. They were also brittle in ways the team could see in production but could not reliably reproduce in development.

The work

We worked across the prediction stack: feature pipelines, training infrastructure, model architecture, and the evaluation harness that decided when a model was ready for release. The first deliverable was not a new model. It was an honest account of what the existing ones were doing and where they failed.

From that baseline, we rebuilt the training pipeline so that runs were reproducible by anyone on the team, then introduced an evaluation harness whose results the engineers actually trusted. Subsequent architecture work was straightforward once those two pieces were in place.

What shipped

A reproducible training pipeline with versioned data and full provenance
An evaluation harness tied to the platform's actual production decisions
Replacement models for the highest-leverage prediction tasks, with measurable improvements over the incumbents
Documentation written for the engineers inheriting the work, not for the engagement

What we learned

The single highest-leverage intervention was the evaluation harness. It converted a category of arguments — is this version actually better? — into a category of measurements. The architecture work that followed was easier to do well because the team had a way to disagree with us on the basis of evidence.

Production ML for an automotive intelligence platform.

The problem

The work

What shipped

What we learned