Read or download the free "2015 Data Science Salary Survey" report to learn about tools, trends, and what pays (and what doesn't) for data professionals.
Data scientists are constantly looking outward, tapping into and extracting information from all manner of data in ways hardly imaginable not long ago. Much of the change is technological — data collection has multiplied as well as our means of processing it — but an important cultural shift has played a part, too, evidenced by the desire of organizations to become "data-driven" and the wide availability of public APIs.
But how much do we look inward, at ourselves? The variety of data roles, both in subject and method, means that even those of us who have a strong grasp of what it means to be a data scientist in a particular domain or sub-field may not have a complete view of the data space as a whole. Just as data we process and analyze for our organizations can be used to decide business actions, data about data scientists can help inform our career choices.
That's where we come in. O'Reilly Media has been conducting an annual survey for data professionals, asking questions primarily about tools, tasks, and salary — and we are now releasing the third installment of the associated report, the 2015 Data Science Salary Survey. The 2015 edition features a completely new graphic design of the report and our findings. In addition to estimating salary differences based on demographics and tool usage, we have given a more detailed look at tasks — how data professionals spend their workdays — and titles.
As in the last two reports, when we cluster tools, the single most obvious divide is between open source and proprietary tools. However, this divide is not nearly as sharp as before: while we still tend to use tools from one category or the other, the degree of overlap has risen. R in particular is representative of this shift, "migrating" from last year's two main open source clusters toward some of the proprietary tools when plotted using tool-to-tool correlation as a metric. Some Hadoop tools such as Hive, Cloudera, and Hortonworks show the same trend, to a lesser degree.
One thing that hasn't changed is Spark's rise in popularity and association with some of the higher salaries. Scala (which, predictably, correlates strongly with Spark), Amazon EMR, and D3 are three other tools used by, on average, more highly paid data scientists in the sample.
Of course, this is not just about data scientists: those calling themselves analysts (business intelligence or otherwise), architects, developers, engineers, and managers are all in the sample. There are some key differences — engineers use less R, architects use more D3 — but as we might expect, there is significant overlap in the tools and tasks between all of these groups.
We hope that you will find this free report not only interesting but useful: maybe it can inform your next push to learn a language or (big) data framework. If you can spare just a few minutes, please take the survey yourself and share your perspective. This project is ongoing and, with your help, we can find more patterns — not about just any old data set, but about ourselves.