Chapter 1. Your Mission, Should You Choose to Accept It
So here you are, pondering the running of a data science team. Maybe youâre a data scientist who has risen through the ranks and been put in charge of a growing team. Maybe youâre an engineering or product leader tasked with building a new data science function from scratch. Or maybe youâre being proactive, sizing up what data science management looks like and figuring out whether itâs for you some day. To do your mission well, youâll need to understand how to balance and structure a data science team, how to recruit and interview the best people to fill that team, and how to keep them productive and happy after theyâre in place. It can be a daunting task, but this report is here to help. And luckily, it wonât self-destruct at the end. Over the following pages, we cover each of these topics in turn, with a focus on concrete, actionable tips that you can actually use.
To begin, itâs crucial to recognize that there are several different flavors of data scientist who generally will fill different niches within your organization and need to be managed differently. Your first job as a data science leader is to quickly take stock of your team (or the team youâre tasked with building), match it up against the goals of your organization, and target your efforts around building the right kind of data science team. In this section, we cover the different flavors as well as thoughts on how to structure your team within the context of the rest of the organization.
The Periodic Table of Data Scientists
âData scientistâ is an overloaded term. Different data scientists have different strengths, interests, and ways in which they contribute to your organization. Breaking the role apart into a few types of specialist archetypes helps your data scientists to understand how they should approach their work and can clarify expectations to make everyone happier. You donât necessarily need to formalize these specialist roles and titles for your team (some data scientists donât want to be pigeonholed), but being aware of these different archetypes will help you better understand your team and mentor them on career development.
If your team is small, young, or just starting out, this might be more detail than you need to be successful: donât overthink your roles. However, these distinctions become important as your data scientists become more senior and you need to give them structured thoughts on what kinds of opportunities will help them grow (see ChapterÂ 6 on career ladders for more on this).
Operational Data Scientists
Some data scientists are operations oriented in the sense that their strength is applying data science to the everyday functioning of your business. An operational data scientist focuses on optimizing the organizationâs decisions, using data, so that business goals are met more effectively. They work closely with business stakeholders, understanding what they do and helping them to do it better with data. An operational data scientist is a great communicator of technical concepts to nontechnical colleagues and helps translate business goals into data science models and metrics. Theyâre also fluent in all of the data systems that power the organization, from customer relationship management to enterprise resource planning to marketing automation. They know what the companyâs top-line goals are for the year and write slide decks and applications with an eye toward getting recommendations adopted by the organization. At a certain level of seniority, an operational data scientist begins to look like a strategist, marketer, logistics coordinator, or other functional leaderâbut theyâre extremely effective because of their ability to support recommendations with data and models, not just intuition.
Although there can be overlap between an operational data scientist and someone in an analytics or business intelligence role, there are crucial differences. Like the product-focused data scientist, who we look at in the next section, an operational data scientist is more likely to be doing modeling and engineering work than an analyst. Unlike the product-focused data scientist, their models and code are inward facing, analyzing and optimizing the function of the business itself, and their customers are sales or marketing teams rather than external customers.
Product-Focused Data Scientists
Another type of data scientist is the product-focused data scientist. This flavor of data scientist works closely with a product team, and their work might even be embedded within the product itself, using data science and machine learning to increase its value. For example, they might build models that power recommendations on the company website or app, or they might work with the product designers to answer questions like, âHow do we know our new product will fill a need in the market?â âWhat feature should we prioritize on our roadmap?â, or âHow will we measure whether this feature actually works?â A product-focused data scientist needs to be business savvy, like the operational data scientist, and keep macroscopic company goals in mind, but they also need to be more technically rigorous because of the quality and quantity of data that powers many modern products. Their code and models are more likely to reach the organizationâs customers via the product, and, as a result, they are deeply user centric.
Engineering Data Scientists
A third way a data scientist adds value to an organization is via engineering work. The engineering data scientist builds and maintains the systems that power the work of the product or operational data scientists. They run production code and machine learning systems smoothly and efficiently, processing and analyzing datasets at scale and solving data-intensive problems. An engineering data scientist maintains a high standard of technical excellence, and in the case of a more senior engineering data scientist, they begin to play a role that looks a lot like a lead or principal engineer. Increasingly, the name of this role in the market is morphing to machine learning engineer or data science engineer, signaling the technical depth required of someone doing this flavor of data science job. Technical project management and leadership and code review skills are crucial for this role.
Research Data Scientists
Finally, some data scientists are expected to play more of a purely research role. A research data scientist is solely tasked with advancing the state of the art, often in a field like deep learning or computer vision or natural language processing, without any explicit expectation that their work will be immediately useful to the company. Be careful and use this role sparingly: itâs very unlikely that your team is a pure research data science team. Typically, these roles are more like the research scientist roles who are found at places like Google or Microsoft Research and are staffed (often exclusively) with PhDs. Very few organizations have the need for such a team or the structure to support them. Nonetheless, itâs probably the most glamorous kind of data scientist, and it might be the expectation of people on your team. Manage those expectations carefully.
Another good way to think about these flavors is with the âType Aâ versus âType Bâ dichotomy. In Type A, the âAâ is for âanalysis,â mapping nicely onto the operational data scientist as weâve described them. In Type B, the âBâ is for âbuilding,â and this maps to the product, engineering, and research data scientists.
The Structure of Scientific Teams
Although many companies know that they âshould have a data science team,â thereâs often no clear consensus on what that team should do or how it should interact with the rest of the organization. One of the best ways to cut through that fog is to use these archetypes to align your data scientists with more traditional departments and roles like operations and marketing, product and engineering, or research. That will help your less data-savvy colleagues understand the goal of the data science program and how to interact with it, which will make you more successful.
Given these different flavors, should you seek to balance your team with different kinds of specialists or load it with generalist âfull-stack data scientistsâ? The specialist-versus-generalist debate continues to rage, but as we explain later in ChapterÂ 6 on career ladders, we prefer the philosophy of the T-shaped data scientist. Look to hire folks who are broad enough to hack their way through the basics of each function but have (or will choose to develop) depth in a particular area that aligns with one of the flavors.
Data science is becoming pervasive enough that, in many organizations today, data scientists need to serve multiple parts of an organization. However, this introduces a thorny question: how should you distribute your data scientists? Should they all sit together in a core data science team or department? Should they sit with their functional teams, closer to the business or product problems? Or is there an in-between model that works?
A centralized data science team can lead to a strong sense of identity, program coherence, and knowledge sharing. Data scientists love to talk shop with one another, and they take pride in their work, so having them sit and work together leads to high team cohesion as they push each other to explore and try new things. An upside of this is high job satisfaction and, hopefully, good team culture and high retention. The downside of this model is that it places your data scientists further from the problems that need to be solved out in the rest of the business, making it easier for them to end up adrift or down rabbit holes. A data science team needs to be effective as well as happy.
Another upside of the centralized model is that decisions about which workstreams to tackle (and by whom and how) are more likely to be made by trained data scientists. Having one centralized team with a centralized way of handling requests from outside teams can help the entire company to ensure itâs using its precious resources efficiently.
A fully distributed data science team with the data scientists embedded within other business units or teams flips this dynamic around. The data scientists are closer to the actual work on the ground and gain crucial business context and awareness of the needs of their colleagues across the organization. However, even though the organization might benefit more from their work in this arrangement, the data scientists themselves often end up wishing for more like-minded people to help them think through data or software or methodological problems. Anecdotally, weâve known a number of data scientists who have left organizations because of the isolation and lack of mentorship caused by the fully distributed model.
A good compromise is a hybrid of the two models. Sometimes this is called a center-of-excellence model, but whatever name it goes by, itâs characterized by a central data science team that pools best practices and shares ideas while the data scientists themselves are primarily âloaned outâ or embedded within functional teams for âtours of duty.â In other words, although the data scientists spend most of their time accountable to the organization more broadly, there are structured chances for them to get together, share ideas, talk about what interesting problems theyâve uncovered, and generally create some of that culture that makes the centralized model so appealing. In a product or engineering-focused organization, this model can look like a centralized data science team where members are embedded in product or engineering teams, attend planning meetings and standups, and even sit with the team for the life of a project, but eventually return to their home base.
One thing to consider is how your team structure might evolve as your team grows and scales. When a team is just starting out, itâs unlikely that a fully distributed model will be successful for reasons weâve already mentioned. But at Facebook and Google scale, there will likely be enough data scientists to spread around that loneliness, as such, isnât a real factor. If youâre somewhere in the middle, we recommend the hybrid model.