Chapter 1. Choosing Your Data Science Team
There is a wide variety of things a data science team can do, both because the term “data science” is absurdly broad and because different companies have very different needs. Data science can mean a company having a single analyst who makes data dashboards for executives or a team of people creating production-ready machine learning APIs. A company could need help using data to define a business strategy, to make tailored customer experiences, to forecast investments, or something else entirely! Because there are so many different types of work a data science team might do, there are many different ways to staff the team to meet the needs of the business. This chapter covers how to think through who you should hire for your team.
Every data science team will naturally have data scientists on it, but in addition to the data scientists themselves, there are many potential supporting roles that could be included on the team. Having more positions on the team can give you a wider range of capabilities but also requires managing more types of work. Consider that a data science team needs to do all of the following to effectively work:
- Create a vision
-
Design a strategy for what the data scientists will work on by communicating with stakeholders about their needs and assessing the capabilities of the team.
- Project management
-
Keep track of what is being worked on and communicate when timeline issues arise.
- Stakeholder management
-
Work with stakeholders to help them understand the capabilities of data science and figure out data science opportunities.
- People management
-
Help the team through performance reviews, feedback, support, and other managerial tasks.
- Technical mentoring
-
Help more junior data scientists work through issues and resolve technical roadblocks.
- Data engineering (optional)
-
Have data in a location that data scientists can then clean and use for their work, as well as locations for the outputs of models and analyses to be stored.
- Software engineering (optional)
-
If necessary, write code that will call the models the data scientists build, such as the UI and backend that show the model results to customers.
- The data science work itself
-
This is indeed important as well.
That is a lot of tasks besides actual data science—and almost none of the work is optional! Not only that, but a lot of these tasks can be done by people with different roles. This creates a mathematical matching problem for how to design a team, as Figure 1-1 shows.
From Figure 1-1 we can see there are many possibilities for how to cover the needs of a data science team. For instance, you could choose to hire a data engineer to do any data engineering or have a software engineer do software engineering and data engineering (and struggle a bit with the latter). The complexity of the problem increases since not all of these tasks need to be covered—you could ignore some and hope your team keeps running OK. Together there are many feasible solutions. So beyond the fact that your data science team is going to have data scientists on it, there are a number of different ways you can structure it, depending on some key decisions around who will be leading the team, who will be managing the projects, what sorts of technical roles will be on the team, and how much specialization will you have.
How Are You Going to Split Up the Leadership-Style Tasks?
A data science team really needs two different types of leadership—managerial and technical. The managerial tasks are those you’d associate with a typical manager. These are tasks like creating the strategy for the team, meeting with stakeholders about requirements, deciding who to hire and promote, and so on. The technical leadership makes the big technical decisions and helping the junior data scientists. These are tasks like deciding what kind of model architecture to use for a major project or mentoring a junior data scientist through their first analysis.
One option for running a team is to try and have a single person do both of these types of leadership work. However, generally, there are few people who are well qualified to do both the managerial leadership tasks and the technical leadership tasks. Even if a data science team does have a person with both skill sets, they’re likely to be too busy to do both of those tasks well. So it’s dicey to put all of this on a single person.
Thus, another option is for a team to have two separate roles, a manager and a technical lead. The manager works on all of the managerial tasks, including maintaining connections with the stakeholders. The technical lead comes up with the best technical solutions for the team to implement and helps mentor the team. Having two senior people is expensive, and if they aren’t working well together, it can create a host of problems, but working in sync is a great option that lets the two leaders do what they do best.
Finally, a data science team can choose to have a leader who only focuses on one of those two types of leadership and accept that there will be gaps in what is managed. A data science team with a manager but no tech lead would need the data scientists on the team to be able to work without much mentorship, which usually means hiring more senior data scientists. A data science team with a tech lead but no manager would need to be able to trust that the other leaders at the company were still positioning their team well strategically. Both of these options are possible but not ideal. If the data science team is small, you might need to go with this option and then hope the rest of the team fills in the gaps.
If you are a leader reading this report, it is very important for you to understand your strengths and weaknesses and the best ways for you to support your team. If you come from a technical background and really wish part of your leadership job let you keep coding, then make sure you have strong support on the managerial component. If you do not have a technical background in data science, make sure when you hire data scientists, you have senior ones on your team and you listen to them. More than almost any other section of this report, the question “how are you going to do leadership-style tasks?” requires deep introspection and a willingness to share power and collaborate.
How Are You Going to Do Project Management?
Project management (organizing the tasks the team needs to accomplish and ensuring things are being completed on schedule) is critically important to ensuring that the team actually completes tasks at all. In addition to keeping the work itself organized, all parties must be kept informed and up to date on changes. Whether or not a person is explicitly assigned to the task of project management, this will inevitably end up being done by someone, somehow. Maybe you have a single person who has the sole job of project management, or maybe a data scientist will create an Excel spreadsheet of tasks, but someone somewhere on the team is doing project management.
Oftentimes the project management is being done by the same person who does the people management. This makes a lot of sense: the person who is managing the team and meeting with stakeholders is well suited to also keeping track of what is getting done and communicating it with others. Unfortunately, a manager is often quite busy with meetings and their own work, so the project management can fall by the wayside, to the detriment of the team as a whole. Having the manager of a team do the project management requires lots of discipline from the manager and works best when the team is small.
Alternatively, you can hire a person to do the project management (an aptly named project manager). This frees up the manager to focus on other tasks. A project manager may or may not have a technical background. Having a technical background makes it easier for them to understand what the data science tasks and blockers are, but a nontechnical person should still be able to communicate with the data scientists enough to understand what the situation is and how to help.
If the project manager is an entirely separate role on the team, it’s critically important that the project manager has organizational influence. There will be many times that the project manager will have to help the data scientists by going to other parts of the organization and fixing blockers like a lack of access to data. The project manager may also have to be firm with the data scientists—like telling them to stop working on the fun-but-noncritical task and instead finish the job that is needed by the end of day. If the project manager has influence within the organization, these sorts of situations will be possible to navigate (although still difficult). If the project manager does not have influence, it will be impossible. The best thing a team with a dedicated project manager can do to help them succeed is to back up the project manager and listen to them as much as they can.
As a leader of a large team, say around eight people or more, you’ll almost certainly need to hire a separate project manager. As a leader of a small team, say less than five people, you’ll probably not have the budget and will need to do it yourself. If you have a project manager, the best thing you can do for your team is to hire the right person for the job and empower them to do it well. If you’re doing it yourself, then the best thing you can do for your team is ensure you are consistently keeping up with project management.
Chapter 2 is focused on the methods for managing the data science work for a team, which is a primary focus for whomever is doing the project management. The chapter will discuss in more detail the topic of working with stakeholders and how to keep the work moving. Chapter 3 is about helping data scientists perform their best, which includes helping them take on their own project management. Strong project management is quite important to running a successful team.
Will Your Data Science Team Have Non–Data Science Technical Roles on It?
In some organizations, it makes sense to have non–data science roles on a data science team, like software engineers or data engineers. They can help get the data organized for the data scientists, write code that acts as wrappers around the models, and more. By having these types of people on the actual data science team, the data scientists will be well supported. The downside of having other engineering roles on the team is that an inconsistent amount of work might need to be done. For example, some weeks might not require any data engineering work, and it’s not clear what a data engineer should be doing in those weeks. There also might be organizational redundancies. For example, if the data science team has data engineers and the data engineering team has data scientists, who does what?
This sort of decision is largely outside the data science team itself. If you’re in an organization set up around projects, your team will likely have multiple roles. These organizations tend to be filled with teams of many different specialties who all closely work together to do projects. If you’re in an organization set up around roles, a pure data science team makes more sense. Role-based organizations tend to have departments for types of workers like an engineering department, a data science department, and so on.
Data engineering is a particularly common case for having a non–data science technical person on the team. In some organizations, there are entire data engineering teams dedicated to storing data and making it accessible to other teams, in which case your data science team won’t need data engineers. In other organizations, generally smaller ones, there aren’t teams dedicated to it, and thus the burden of maintaining databases and keeping them up to date will fall in other places, like possibly your data science team. But it’s worth thinking through data engineering in particular because without some sort of data backend, your data scientists cannot work. There are many tales of data scientists being hired at companies only to realize the company doesn’t have any data for them to work with. If you choose not to have data engineers on your team, make sure there is an infrastructure in place for your data scientists to get what they need.
Should the Data Scientists on the Team Specialize?
If your data scientists are specialized in particular areas, such as forecasting, experimentation, and optimization, you in theory should be able to accomplish more as a team since you’ll have more areas of knowledge covered. On the other hand, if everyone on your team acts as a data science generalist, so that any body of work assigned to the data science team can be done by any member of the team, you should have a much more robust group. A team of generalists will be able to review each other’s work better, handle teammates having time off or quitting, and chip in if projects require extra help. Together, this creates a push and pull between having more specialization so the team can accomplish more and more generalizing so the team runs smoothly.
There are also different types of specialization:
-
Technical topics like forecasting, experimentation, and optimization
-
Domains like fraud, marketing, and logistics
-
Particular parts of the company, like a particular dataset
Depending on the organization, there may be almost no risk to certain types of specialization. Having data scientists who are experts in marketing is probably just fine for a marketing company. But specializing in a particular technical topic or dataset within the business might lead to more situations where there isn’t enough work for the data scientist to do.
As a practice, with all other things being equal, generalization is probably better. On a day-to-day basis your data science team will succeed based on how smoothly it runs, not how many obscure techniques the team knows (see Chapter 4 for a more in-depth discussion of this). Further, your data scientists will naturally specialize on their own. Data scientists love learning new things, and so if a project requires concepts and methods that data scientists don’t know, they’ll go learn them. You don’t need to structure the whole team around only certain people knowing how to do certain things.
That said, there are situations that require such finesse that you’ll want specialists to do them. A classic example of this is optimization and reinforcement learning, which generally isn’t included in the standard data science curriculum and takes a while to learn. As a data science team gets larger, you’ll find more situations where you feel obligated to organize the team so that some people have roles related to their specialty. But try and avoid this for as long as you can, and if you do hire specialists in a particular field, don’t hire just one. A single specialist won’t have anyone to bounce their ideas off of and, worse, no one to check that what they are doing is what they say they’re doing.
As you can see, there are countless ways to organize data science teams. There is no universally correct way to do so; it very much depends on the particulars of the company environment. Some situations may call for a single large team of specialists serving the entire company, while others require data scientists to be spread throughout. The data science teams may be large and filled with many roles like managers, project managers, and data engineers, or they may be small, with just a few data scientists and a technical leader. The role of a good data science leader is to understand the context of the organization and make the best decision for the team within it.
With a well-designed data science team, the data scientists should be ready to handle the work from stakeholders. Unfortunately, for most data science teams, the volume of demands put on them for deliverable work far exceeds the capacity of the team. In the next chapter, we’ll discuss how a data science team can manage their workflow and strategically think through what types of work to do.
Get Leading Data Science Teams now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.