Engraving of the reading room at the British Museum
Engraving of the reading room at the British Museum

Executive Summary

Now in its third edition, the 2015 version of the Data Science Salary Survey explores patterns in tools, tasks, and compensation through the lens of clustering and linear models. The research is based on data collected through an online 32-question survey, including demographic information, time spent on various data-related tasks, and the use/non-use of 116 software tools. Over 600 respondents from a variety of industries completed the survey, two-thirds of whom are based in the United States.

Key findings include:

  • The same four tools—SQL, Excel, R, and Python—remain at the top for the third year in a row
  • Spark (and Scala) use has grown tremendously from last year, and their users tend to earn more
  • Using last year’s data for comparison, R is now used by more data professionals who otherwise tend to use commercial tools
  • Inversely, R is no longer used as frequently by data practitioners who use other open source tools such as Python or Spark
  • Salaries in the software industry are highest
  • Even when all other variables are held equal, women are paid thousands less than their male counterparts
  • Cloud computing (still) pays
  • About 40% of variation in respondents’ salaries can be attributed to other pieces of data they provided

We invite you to not only read the report but participate: try plugging your own information into one of the linear models to predict your own salary. And, of course, the survey is open for the 2016 report. Spend just 5 to 10 minutes and take the anonymous salary survey here. Thank you!

Article image: Engraving of the reading room at the British Museum