O'Reilly logo

The Care and Feeding of Data Scientists by Katie Malone, Michelangelo D'Agostino

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. Your Mission, Should You Choose to Accept It

So here you are, pondering the running of a data science team. Maybe you’re a data scientist who has risen through the ranks and been put in charge of a growing team. Maybe you’re an engineering or product leader tasked with building a new data science function from scratch. Or maybe you’re being proactive, sizing up what data science management looks like and figuring out whether it’s for you some day. To do your mission well, you’ll need to understand how to balance and structure a data science team, how to recruit and interview the best people to fill that team, and how to keep them productive and happy after they’re in place. It can be a daunting task, but this report is here to help. And luckily, it won’t self-destruct at the end. Over the following pages, we cover each of these topics in turn, with a focus on concrete, actionable tips that you can actually use.

To begin, it’s crucial to recognize that there are several different flavors of data scientist who generally will fill different niches within your organization and need to be managed differently. Your first job as a data science leader is to quickly take stock of your team (or the team you’re tasked with building), match it up against the goals of your organization, and target your efforts around building the right kind of data science team. In this section, we cover the different flavors as well as thoughts on how to structure your team within the context of the rest of the organization.

The Periodic Table of Data Scientists

“Data scientist” is an overloaded term. Different data scientists have different strengths, interests, and ways in which they contribute to your organization. Breaking the role apart into a few types of specialist archetypes helps your data scientists to understand how they should approach their work and can clarify expectations to make everyone happier. You don’t necessarily need to formalize these specialist roles and titles for your team (some data scientists don’t want to be pigeonholed), but being aware of these different archetypes will help you better understand your team and mentor them on career development.

If your team is small, young, or just starting out, this might be more detail than you need to be successful: don’t overthink your roles. However, these distinctions become important as your data scientists become more senior and you need to give them structured thoughts on what kinds of opportunities will help them grow (see Chapter 6 on career ladders for more on this).

Operational Data Scientists

Some data scientists are operations oriented in the sense that their strength is applying data science to the everyday functioning of your business. An operational data scientist focuses on optimizing the organization’s decisions, using data, so that business goals are met more effectively. They work closely with business stakeholders, understanding what they do and helping them to do it better with data. An operational data scientist is a great communicator of technical concepts to nontechnical colleagues and helps translate business goals into data science models and metrics. They’re also fluent in all of the data systems that power the organization, from customer relationship management to enterprise resource planning to marketing automation. They know what the company’s top-line goals are for the year and write slide decks and applications with an eye toward getting recommendations adopted by the organization. At a certain level of seniority, an operational data scientist begins to look like a strategist, marketer, logistics coordinator, or other functional leader—but they’re extremely effective because of their ability to support recommendations with data and models, not just intuition.

Although there can be overlap between an operational data scientist and someone in an analytics or business intelligence role, there are crucial differences. Like the product-focused data scientist, who we look at in the next section, an operational data scientist is more likely to be doing modeling and engineering work than an analyst. Unlike the product-focused data scientist, their models and code are inward facing, analyzing and optimizing the function of the business itself, and their customers are sales or marketing teams rather than external customers.

Product-Focused Data Scientists

Another type of data scientist is the product-focused data scientist. This flavor of data scientist works closely with a product team, and their work might even be embedded within the product itself, using data science and machine learning to increase its value. For example, they might build models that power recommendations on the company website or app, or they might work with the product designers to answer questions like, “How do we know our new product will fill a need in the market?” “What feature should we prioritize on our roadmap?”, or “How will we measure whether this feature actually works?” A product-focused data scientist needs to be business savvy, like the operational data scientist, and keep macroscopic company goals in mind, but they also need to be more technically rigorous because of the quality and quantity of data that powers many modern products. Their code and models are more likely to reach the organization’s customers via the product, and, as a result, they are deeply user centric.

Engineering Data Scientists

A third way a data scientist adds value to an organization is via engineering work. The engineering data scientist builds and maintains the systems that power the work of the product or operational data scientists. They run production code and machine learning systems smoothly and efficiently, processing and analyzing datasets at scale and solving data-intensive problems. An engineering data scientist maintains a high standard of technical excellence, and in the case of a more senior engineering data scientist, they begin to play a role that looks a lot like a lead or principal engineer. Increasingly, the name of this role in the market is morphing to machine learning engineer or data science engineer, signaling the technical depth required of someone doing this flavor of data science job. Technical project management and leadership and code review skills are crucial for this role.

Research Data Scientists

Finally, some data scientists are expected to play more of a purely research role. A research data scientist is solely tasked with advancing the state of the art, often in a field like deep learning or computer vision or natural language processing, without any explicit expectation that their work will be immediately useful to the company. Be careful and use this role sparingly: it’s very unlikely that your team is a pure research data science team. Typically, these roles are more like the research scientist roles who are found at places like Google or Microsoft Research and are staffed (often exclusively) with PhDs. Very few organizations have the need for such a team or the structure to support them. Nonetheless, it’s probably the most glamorous kind of data scientist, and it might be the expectation of people on your team. Manage those expectations carefully.

Another good way to think about these flavors is with the “Type A” versus “Type B” dichotomy. In Type A, the “A” is for “analysis,” mapping nicely onto the operational data scientist as we’ve described them. In Type B, the “B” is for “building,” and this maps to the product, engineering, and research data scientists.

The Structure of Scientific Teams

Although many companies know that they “should have a data science team,” there’s often no clear consensus on what that team should do or how it should interact with the rest of the organization. One of the best ways to cut through that fog is to use these archetypes to align your data scientists with more traditional departments and roles like operations and marketing, product and engineering, or research. That will help your less data-savvy colleagues understand the goal of the data science program and how to interact with it, which will make you more successful.

Given these different flavors, should you seek to balance your team with different kinds of specialists or load it with generalist “full-stack data scientists”? The specialist-versus-generalist debate continues to rage, but as we explain later in Chapter 6 on career ladders, we prefer the philosophy of the T-shaped data scientist. Look to hire folks who are broad enough to hack their way through the basics of each function but have (or will choose to develop) depth in a particular area that aligns with one of the flavors.

Data science is becoming pervasive enough that, in many organizations today, data scientists need to serve multiple parts of an organization. However, this introduces a thorny question: how should you distribute your data scientists? Should they all sit together in a core data science team or department? Should they sit with their functional teams, closer to the business or product problems? Or is there an in-between model that works?

A centralized data science team can lead to a strong sense of identity, program coherence, and knowledge sharing. Data scientists love to talk shop with one another, and they take pride in their work, so having them sit and work together leads to high team cohesion as they push each other to explore and try new things. An upside of this is high job satisfaction and, hopefully, good team culture and high retention. The downside of this model is that it places your data scientists further from the problems that need to be solved out in the rest of the business, making it easier for them to end up adrift or down rabbit holes. A data science team needs to be effective as well as happy.

Another upside of the centralized model is that decisions about which workstreams to tackle (and by whom and how) are more likely to be made by trained data scientists. Having one centralized team with a centralized way of handling requests from outside teams can help the entire company to ensure it’s using its precious resources efficiently.

A fully distributed data science team with the data scientists embedded within other business units or teams flips this dynamic around. The data scientists are closer to the actual work on the ground and gain crucial business context and awareness of the needs of their colleagues across the organization. However, even though the organization might benefit more from their work in this arrangement, the data scientists themselves often end up wishing for more like-minded people to help them think through data or software or methodological problems. Anecdotally, we’ve known a number of data scientists who have left organizations because of the isolation and lack of mentorship caused by the fully distributed model.

A good compromise is a hybrid of the two models. Sometimes this is called a center-of-excellence model, but whatever name it goes by, it’s characterized by a central data science team that pools best practices and shares ideas while the data scientists themselves are primarily “loaned out” or embedded within functional teams for “tours of duty.” In other words, although the data scientists spend most of their time accountable to the organization more broadly, there are structured chances for them to get together, share ideas, talk about what interesting problems they’ve uncovered, and generally create some of that culture that makes the centralized model so appealing. In a product or engineering-focused organization, this model can look like a centralized data science team where members are embedded in product or engineering teams, attend planning meetings and standups, and even sit with the team for the life of a project, but eventually return to their home base.

One thing to consider is how your team structure might evolve as your team grows and scales. When a team is just starting out, it’s unlikely that a fully distributed model will be successful for reasons we’ve already mentioned. But at Facebook and Google scale, there will likely be enough data scientists to spread around that loneliness, as such, isn’t a real factor. If you’re somewhere in the middle, we recommend the hybrid model.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required