IN THIS FOURTH EDITION of the O’Reilly Data Science
Salary Survey, we’ve analyzed input from 983 respondents
working in the data space, across a variety of industries—
representing 45 countries and 45 US states. Through the
results of our 64-question survey, we’ve explored which tools
data scientists, analysts, and engineers use, which tasks they
engage in, and of course—how much they make.
Key findings include:
- Python and Spark are among the tools that contribute
most to salary.
- Among those who code, the highest earners are the ones
who code the most.
- SQL, Excel, R and Python are the most commonly used
- Those who attend more meetings, earn more.
- Women make less than men, for doing the same thing.
- Country and US state GDP serves as a decent proxy for
geographic salary variation (not as a direct estimate, but
as an additional input for a model).
- The most salient division between tool and tasks usage
is between those who mostly use Excel, SQL, and a small
number of closed source tools—and those who use more
open source tools and spend more time coding.
- R is used across this division: even people who don’t code
much or use many open source tools, use R.
- A secondary division emerges among the coding half—
separating a younger, Python-heavy data scientist/analyst
group, from a more experienced data scientist/engineer
cohort that tends to use a high number of tools and earns
the highest salaries.
To see our complete model and input your own metrics to
predict salary, see Appendix B (but beware—there’s a transformation
involved: don’t forget to square the result!).