Editor's note: This is the second in a three-part series of posts by Daniel Tunkelang dedicated to data science as a profession. In this series, Tunkelang will cover the recruiting, organization, and essential functions of data science teams.
When LinkedIn posted its first job opening for a "data scientist" in 2008, the company was clearly looking for generalists:
Be challenged at LinkedIn. We’re looking for superb analytical minds of all levels to expand our small team that will build some of the most innovative products at LinkedIn.
No specific technical skills are required (we’ll help you learn SQL, Python, and R). You should be extremely intelligent, have quantitative background, and be able to learn quickly and work independently. This is the perfect job for someone who’s really smart, driven, and extremely skilled at creatively solving problems. You’ll learn statistics, data mining, programming, and product design, but you’ve gotta start with what we can’t teach—intellectual sharpness and creativity.
In contrast, most of today's data scientist jobs require highly specific skills. Some employers require knowledge of a particular programming language or tool set. Others expect a Ph.D. and significant academic background in machine learning and statistics. And many employers prefer candidates with relevant domain experience.
If you are building a team of data scientists, should you hire generalists or specialists? As with most things, it depends. Consider the kinds of problems your company needs to solve, the size of your team, and your access to talent. But, most importantly, consider your company's stage of maturity.
Generalists add more value than specialists in a company’s early days, since you’re building most of your product from scratch and something is better than nothing. Your first classifier doesn't have to use deep learning to achieve game-changing results. Nor does your first recommender system need to use gradient-boosted decision trees. And a simple t-test will probably serve your A/B testing needs.
Hence, the person building the product doesn't need to have a Ph.D. in statistics or 10 years of experience working with machine learning algorithms. What's more useful in the early days is someone who can climb around the stack like a monkey and do whatever needs doing, whether it’s cleaning data or native mobile app development.
How do you identify a good generalist? Ideally this is someone who has already worked with data sets that are large enough to have tested his or her skills regarding computation, quality, and heterogeneity. Surely someone with a STEM background, whether through academic or on-the-job training, would be a good candidate. And someone who has demonstrated the ability and willingness to learn how to use tools and apply them appropriately would definitely get my attention. When I evaluate generalists, I ask them to walk me through projects that showcase their breadth.
Generalists hit a wall as your products mature: they’re great at developing the first version of a data product, but they don’t necessarily know how to improve it. In contrast, machine learning specialists can replace naive algorithms with better ones and continuously tune their systems. At this stage in a company’s growth, specialists help you squeeze additional opportunity from existing systems. If you're a Google or Amazon, those incremental improvements represent phenomenal value.
Similarly, having statistical expertise on staff becomes critical when you are running thousands of simultaneous experiments and worrying about interactions, novelty effects, and attribution. These are first-world problems, but they are precisely the kinds of problems that call for senior statisticians.
How do you identify a good specialist? Look for someone with deep experience in a particular area, like machine learning or experimentation. Not all specialists have advanced degrees, but a relevant academic background is a positive signal of the specialist’s depth and commitment to his or her area of expertise. Publications and presentations are also helpful indicators of this. When I evaluate specialists in an area where I have generalist knowledge, I expect them to humble me and teach me something new.
Of course, the ideal data scientist is a strong generalist who also brings unique specialties that complement the rest of the team. But that ideal is a unicorn—or maybe even an alicorn. Even if you are lucky enough to find these rare animals, you’ll struggle to keep them engaged in work that is unlikely to exercise their full range of capabilities.
So, should you hire generalists or specialists? It really does depend—and the largest factor in your decision should be your company’s stage of maturity. But if you're still not sure, then I suggest you favor generalists, especially if your company is still in a stage of rapid growth. Your problems are probably not as specialized as you think, and hiring generalists reduces your risk. Plus, hiring generalists allows you to give them the opportunity to learn specialized skills on the job. Everybody wins.