A recurring concern I hear from heads of data science teams is that their data scientists spend too much time preparing their data for analysis rather doing actual analysis. As one hedge fund manager in London told me, “I don’t pay my quants to ‘fiddle’ with their data – I pay them to find alpha and manage risk.”
Various studies have found data scientists spend as much as 80% of their time preparing data to make it ready for analysis. Why is so much time (and money) spent on data preparation? There are several persistent reasons:
- Data prep includes many tasks:
- Collecting data from various sources
- Data cleaning (handling missing values, fixing errors)
- Normalizing / transforming data
- Joining data sets
- Ensuring data quality and consistency
- Real-world data is messy – it’s incomplete, inconsistent, and often poorly documented.
- Data integration is complex because data comes from multiple systems (e.g. flat files, databases, APIs).
- Automated tools help, but not enough, as data prep is still heavily manual or semi-automated despite tool improvements.
So what can be done? In chatting with several heads of data science about how they tackle this problem, I heard some consensus. Improving the productivity of a data science team involves balancing technical rigor with strategic focus, streamlined workflows, and effective collaboration.
Here are eight (8) steps you can take to improve your data science team’s productivity…
#1 – Align Research with Business Objectives.
Ensure that your data scientists understand how their analysis and research tie into your firms goals. Clear alignment with profit centers and risk appetite focuses their efforts on the most impactful areas. Involving them in strategy discussions early fosters accountability and sharper insights.
#2 – Streamline Data Access and Infrastructure.
A major bottleneck in any data scientist’s work is slow or fragmented access to data. Invest in a centralized, well-maintained data infrastructure with robust APIs, automated data cleaning, and low-latency access. Equip the team with fast computing environments and version-controlled research platforms to support iterative experimentation.
#3 – Encourage Model Reusability and Modular Development.
Build a shared library of tools, frameworks, and statistical functions. Avoid reinventing the wheel — modular, reusable components cut development time and improve consistency. Encourage documentation and standardized code practices to make onboarding and cross-project collaboration easier.
#4 – Foster a Culture of Review and Risk Awareness.
Rigorous peer reviews, model validation, and stress testing are essential in data science. Promote a culture where assumptions are challenged constructively and models are scrutinized for robustness, not just performance. This improves both the reliability of output and collective team learning.
#5 – Prioritize Communication and Clarity.
Your data scientists must translate complex models into actionable insights for various stakeholders. Encourage concise, data-driven communication with visual aids and plain-language summaries. Strong communicators bridge the gap between research and decision-making.
#6 – Balance Innovation with Execution.
While research is inherently exploratory, productivity improves when projects are broken into manageable, testable milestones. Maintain a research backlog, use agile principles for iteration, and prioritize projects with clear potential for revenue generation or cost savings.
#7 – Support Learning and Cross-Training.
The field evolves rapidly. Encourage continuous learning in areas like machine learning, model validation and testing, or communicating findings. Cross-training fosters versatility and resilience.
#8 – Minimize Distractions and Administrative Overhead.
Protect your team from unnecessary meetings, reporting, or task-switching. Allow long, uninterrupted blocks of time for deep analytical work.
The Takeaway.
Your data science team can significantly boost its productivity and generate sustained competitive advantage by creating an environment where strategic focus, technical excellence, and collaboration thrive.
What about you? Is your data science team as productive as you’d like? Have you found ways to boost their productivity? Please comment – I’d love to hear your thoughts.
Also, please connect with DIH on LinkedIn and Twitter.
Thanks,
Tom Myers