Building Data Science Teams: A Practical Guide

December 18, 2019 Lu Wang

Data science is hard, and data scientists are hard to hire. Data science teams often arrive at traditional companies with great fanfare and even greater promises. But without the right structure in place, you may find your team’s data quality demands surprising, their work outputs difficult to interpret, and their models difficult to deploy. If you've been tasked with building a data science team, it's critical to make sure that all roles and responsibilities are clearly defined—and your team is set up to succeed.

Building Data Science Teams: Content Summary
What Does a Good Data Science Team Look Like?
Determining Demand For Your Data Science Team
Investing in the Path To Data Science
Failure Models and Their Root Causes
Conclusion

Summary

Data science initiatives fail when data management and data quality are poor.

Without the support of data analysts and data engineers, data scientists will not be able to get your data into the format they need to do real data science and successfully deploy the solutions that they build.

To unlock the most value from your data science strategy and initiatives, focus on filling all required functional roles—and defining intentional relationships between your data science team and non-data science stakeholders in other teams.

By reading this article, you will find out:

  • What data science team roles you’ll need to fill and what capabilities to look for
  • How to structure your data science team and its interfaces with engineering and stakeholders
  • How to triage your data science team’s problems and connect them to personnel gaps

What Does a Good Data Science Team Look Like?

Let’s start with a comprehensive view of what a data science department looks like.

A data science team helps the company by assisting in data-driven decision making, improving operations, monitoring KPIs, and automating complex processes. One good data science team can serve several external teams. Working with them feels like ::happiness::

Structurally, there are five data science team roles you’ll need to fill. You’ll also need to build three external relationships.

1. Analytics Lead: Your analytics lead will prioritize, structure, and frame initiatives according to their business, strategic, or monetary impact and manage data team resources. He or she will receive results with interpretations from your data scientists and analysts—and interface with executive stakeholders, using these results to support data-driven decisions, operational improvements, and process and outcome monitoring.

2. Data Engineer: Your data engineer will provide properly cleaned and transformed data to the rest of the data science team. He or she will also coordinate with the rest of your engineering team to make sure your data science team has access to the right databases, servers, and tools.

3. Data Scientist: Your data scientist will determine which problems are suitable for modeling approaches. He or she will perform data modeling tasks and interpret the corresponding results.

4-5. Analyst Team: Your analyst team will carry out analyses. They’ll conduct descriptive statistics and organize, manage, and analyze any A/B tests the data science team conducts. Reporting analysts may prepare outputs for external teams (and can serve several different external teams) and are staffed with the following roles:

  • Quantitative Data Analyst: Your quantitative data analyst will create data assets that are used by the data science team and will often be in charge of experiments and statistical analyses.
  • Reporting Analyst: Your reporting analyst will produce reports and other data assets that serve as inputs for other teams’ operations.

Here’s how this looks in practice:

chart: data science team roles and responsibilities

Determining Demand For Your Data Science Team

A successful data science team can serve several external teams. Your data science team is functioning well if it enables external teams to make operational and process improvements by, for example, using models to automate key processes, absorb increased complexity without requiring more operational resources, and reduce the time it takes to produce KPI reporting.

At the beginning, one person may fill multiple roles. But if your data science team does not have a data engineer, an analytics lead, and a reporting analyst, you’ll find that their ability to produce work and support the business will be severely limited. These are the roles that you should always fill first, because even without data science capabilities, you can define and organize useful data assets. So if you start by hiring a data scientist, make sure they have ample practical experience in these domains and have the results to prove it. Ask how they strategized for actual business impact, what projects they deprioritized, and how they prepared data at scale.

It can take months to staff a data science team with individuals to fill each role. As a team scales, a basic ratio of people to maintain is 6 analysts to 2 data engineers to 1 data scientist to 1 analytics lead. The primary drivers of scale are the numbers of external teams that the data science team needs to serve.

There is an important distinction between enabling external teams and absorbing their work. There are few things that destroy a data science team’s morale more than being responsible for an ever-ballooning backlog of reporting requests. It is the job of your analytics lead to make sure that the data science team’s resources are balanced between the completion of ad-hoc requests and the creation of strategic data science assets that can help the business and lead to better processes for other teams—including self-service reporting for the most commonly permuted questions.

Investing in the Path To Data Science

So what are your options for building data science teams?

  1. Hire an analytics lead, a data analyst, and a data engineer, in that order.
  2. Contract an independent data scientist with the experience to assume multiple roles, then hire employees to fill in the rest of the team.
  3. Outsource the work to a boutique data science firm that can supply all the roles at once.

Given the eye-popping headlines about data science and AI salaries, it may come as a surprise to learn that there are so many options available for building functional data science capabilities. And while some may balk at the idea of outsourcing such a critical function, it’s worth pointing out that, for most organizations, data science is a new discipline. Though its promise spans many industries, it’s not yet so core to most businesses that the early phases can’t be outsourced.

Indeed, many companies outsource accounting (arguably more critical to company success) until they are big enough to achieve the ROI on an in-house team. The same can be said of data science. A key driver of data science hires should be the number of external teams that rely on them. As long as that number is low, companies may experience better outcomes by utilizing consultants or firms with proven track records of delivering a TEAM as a service. The company will enjoy the benefits of seeing how a well-run data science team functions and, subsequently, be more successful in running their own internal team once there’s sufficient demand for doing so.

There is another category of investments that can distract companies who are trying to expand their data science capabilities: software, tools, and platforms. A good team is so much more than these things. A good team has a leader who can assess its needs and determine which combination of tools best serve the company. A good team has resourceful and diligent people who can take practically any combination of tools and use them to produce work. You can buy a top-of-the-line laptop and pay for word processing software that is specifically designed to support writers, but it will not write a novel for you. Don’t expect your data science software to do the work for you. You still need people to use the tool for your company’s benefit.

Failure Models and Their Root Causes

What are some common failure modes that bad data science team structures can produce?

Failure Symptoms Root Cause
The data is not good enough for modeling There is insufficient data engineering coverage, or a communications breakdown has occurred between the data engineer and the data scientist.
No one understands the models Expectations have been poorly managed. This usually happens when a data scientist leads the team as a DATA SCIENTIST instead of as an Analytics Lead. A good analytics lead will not attempt to make external teams experts in data science. Instead, he or she will help external teams learn how to use model outputs and interface with the data science team. Most people can drive a car, even though most people do not know how cars work. Hire a good analytics lead.
No one can use the outputs of the models Analyst team staffing is insufficient, leaving the company with few resources who can translate model outputs (which are often technical, stored within a database) into outputs that are suitable for operations.
The organization has rudimentary analytics capabilities but is struggling to derive additional benefits Congratulations! The fact that your company is able to assess the marginal gains of its analytical initiatives means you have excellent analytics leadership. Time to hire a data scientist!
We don’t know the ROI of our analytics initiatives, or we suspect that it is negative You need better analytical leadership. Review the workload of your analytics lead and make sure he or she has sufficient bandwidth and skills to assess ROI.
We are TOLD the ROI of our analytics initiatives are great, but we don’t have any evidence This is unacceptable behavior for an analytical leader.

Conclusion

When your data science initiatives are built on a solid, functional team, you’ll be able not just to unlock better automation and data usage, but also to boost organizational alignment. Suddenly, you’ll be able to create a long-term view of initiatives without relying on tools and processes that are scattered across data silos. This will contextualize short-term fires and the excitement of a “cool new tool that will solve all our problems” against the project lifecycle and help the project and executive team understand the changes necessary to reach their business goals.

Furthermore, what’s good for your business is also good for your data science team. Analysts will get to tell stories with data for the business, which is rewarding. Engineers will get to use interesting tools and cross-domain thinking. Data scientists will have the support they need to do their best work and implement solutions for the business.

Illustration of a hand holding a covered platter where the dome looks like a microchip

On-Demand Webinar: Unlock Your Data Goldmine

Watch top AI and talent management experts discuss how to find, develop, and retain data science talent—and build more effective data science capabilities.

WATCH NOW

About the Author

Lu Wang

Lu Wang is the co-founder and CEO of Komodo Tech, a boutique data science consultancy that uses machine learning and data analytics to generate new value from existing data. She's helped digital marketers, developers, and operations and R&D teams get the information they need to improve their processes and make better, faster decisions.

More Content by Lu Wang
Previous Article
Building Data Science Applications: A Q&A With Aaron Barzilai
Building Data Science Applications: A Q&A With Aaron Barzilai

Independent data scientist and basketball analytics expert Aaron Barzilai discusses best practices for buil...

Next Article
Driving Data Science Adoption: A Q&A With Matt Ahlers
Driving Data Science Adoption: A Q&A With Matt Ahlers

The opportunity is huge. So how can companies drive data science adoption throughout the enterprise? Indepe...

Jumpstart AI initiatives and drive real results—fast.

Get the Guide