Not every company needs machine learning.

In fact, most companies I’ve worked with don’t have an ML problem. They have a collaboration problem.

We talk a lot about model performance, feature engineering, and MLOps. We talk far less about the friction between data engineers and data scientists that quietly derails projects long before a model reaches production. Same data. Different language. Different incentives. No wonder things get lost in translation.

The real bottleneck isn’t the algorithm

Data scientists are trained to ask, “What predicts the outcome best?”

Data engineers are trained to ask, “What will run reliably at scale?”

Both are right. Both are necessary. And both can unintentionally block each other.

I’ve seen data scientists prototype against perfectly shaped, notebook-friendly datasets that have no realistic path to production. I’ve seen data engineers build pristine pipelines that technically work but miss the nuance of what the model actually needs.

The result is predictable: frustration, rework, and eventually a quiet retreat to silos. The data scientist says, “Engineering slowed us down.” The data engineer says, “Research handed us something impossible to support.” Meanwhile, leadership wonders why the ML initiative hasn’t delivered value.

The issue is rarely competence. It’s translation.

Same data, different mental models

Data engineers think in terms of lineage, orchestration, SLAs, and schema evolution.

Data scientists think in terms of signal, bias, drift, and feature importance.

When these mental models aren’t made explicit, assumptions fill the gaps.

For example:

A data scientist assumes a feature can be refreshed daily because the historical extract was daily.
A data engineer knows the upstream system only updates weekly and has no reliable change data capture.
No one clarifies this until late in the project.

That mismatch compounds. Timelines slip. Trust erodes.

In organizations that get this right, I see three consistent behaviors:

  1. Data scientists involve data engineers before feature engineering gets too far.
  2. Data engineers sit in problem-framing sessions, not just implementation reviews.
  3. Both sides agree on what “production-ready” actually means before any serious build begins.

This is less about process and more about shared context.

A practical lesson learned

If you want to improve collaboration between data engineers and data scientists, start with a joint design session before any modeling begins.

In that session, align on four things:

  • What decision will this model influence?
  • What data sources are realistic and supportable?
  • What refresh cadence is required versus nice to have?
  • What does success look like in production, not just in a notebook?

Force the conversation early, when changing direction is cheap.

One of the most effective changes I’ve made as a data leader was instituting a simple rule: no model work without a data engineering partner identified upfront. Not for approval. For co-ownership.

It slowed down the first few projects slightly. It dramatically reduced rework later. More importantly, it shifted the culture from handoffs to shared accountability.

Machine learning is powerful. But it magnifies whatever organizational habits already exist. If collaboration is weak, ML will expose it. If communication is strong, ML becomes a force multiplier.

The Takeaway.

Before investing in more sophisticated models, it’s worth asking whether your data team is speaking the same language.

Where have you seen projects break down between engineering and science, and what actually fixed it?

Thanks,
Tom Myers

P.S. Also, please connect with DIH on LinkedIn.