Chapter 1INTRODUCTION TO DATA SCIENCE

1.1 WHY DATA SCIENCE?

Data science is one of the fastest growing fields in the world, with 6.5 times as many job openings in 2017 as compared to 2012.1 Demand for data scientists is expected to increase in the future. For example, in May 2017, IBM projected that yearly demand for “data scientist, data developers, and data engineers will reach nearly 700,000 openings by 2020.”2 http://InfoWorld.com reported that the #1 “reason why data scientist remains the top job in America”3 is that “there is a shortage of talent.” That is why we wrote this book, to help alleviate the shortage of qualified data scientists.

1.2 WHAT IS DATA SCIENCE?

Simply put, data science is the systematic analysis of data within a scientific framework. That is, data science is the

  • adaptive, iterative, and phased approach to the analysis of data,
  • performed within a systematic framework,
  • that uncovers optimal models,
  • by assessing and accounting for the true costs of prediction errors.

Data science combines the

  • data‐driven approach of statistical data analysis,
  • the computational power and programming acumen of computer science, and
  • domain‐specific business intelligence,

in order to uncover actionable and profitable nuggets of information from large databases.

In other words, data science allows us to extract actionable knowledge from under‐utilized databases. Thus, data warehouses that have been gathering dust can now be leveraged to uncover hidden profit and enhance ...

Get Data Science Using Python and R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.