Chapter 2. Data Science
The term “data science” connotes opportunity and excitement. Organizations across the globe are rushing to build data science teams. The 2015 version of the Data Science Salary Survey reveals that usage of Spark and Scala has skyrocketed since 2014, and their users tend to earn more. Similarly, organizations are investing heavily in a variety of tools for their data science toolkit, including Hadoop, Spark, Kafka, Cassandra, D3, and Tableau—and the list keeps growing. Machine learning is also an area of tremendous innovation in data science—see Alice Zheng’s report “Evaluating Machine Learning Models,” which outlines the basics of model evaluation, and also dives into evaluation metrics and A/B testing.
So, where are we going? In a keynote talk at Strata + Hadoop World San Jose, US Chief Data Scientist DJ Patil provides a unique perspective of the future of data science in terms of the federal government’s three areas of immediate focus: using medical and genomic data to accelerate discovery and improve treatments, building “game changing” data products on top of thousands of open data sets, and working in an ethical manner to ensure data science protects privacy.
This chapter’s collection of blog posts reflects some hot topics related to the present and the future of data science. First, Jerry Overton takes a look at what it means to be a professional data science programmer, and explores best practices and commonly used tools. Russell Jurney then surveys ...
Get Big Data Now: 2015 Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.