CHAPTER2

Big Data Is Different

In Chapter 1, you considered what data science is and is not, and saw how data science is more than data analysis, computer science, or statistics. This chapter further explores data science as a new discipline.

The chapter begins by considering two of the most important issues associated with big data. Then it works through some real-life examples of big data techniques, and considers some of the communication issues involved in an effective big data team environment. Finally, it considers how statistics is and will be part of data science, and touches on the elements of the big data ecosystem.

Two Big Data Issues

There are two issues associated with big data that must be discussed and understood: the “curse” of big data and rapid data flow. These two issues are discussed in the following sections.

The Curse of Big Data

The “curse” of big data is the danger involved in recklessly applying and scaling data science techniques that have worked well for small, medium, and large data sets, but don't necessarily work well for big data. This problem is well illustrated by the flaws found in big data trading (for which solutions are proposed in this chapter).

In short, the curse of big data is that when you search for patterns in large data sets with billions or trillions of data points and thousands of metrics, you are bound to identify coincidences that have no predictive power. Even worse, the strongest patterns might

  • Be caused entirely by chance (like ...

Get Developing Analytic Talent: Becoming a Data Scientist now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.