1Statistics and Data Science

Statistical methods first came into use before homes had electricity, and had several phases of rapid growth:

  • The first big boost came from manufacturers and farmers who were able to decrease costs, produce better products, and improve crop yields via statistical experiments.
  • Similar experiments helped drug companies graduate from snake oil purveyors to makers of scientifically proven remedies.
  • In the late 20th century, computing power enabled a new class of computationally intensive methods, like the resampling methods that we will study.
  • In the early decades of the current millennium, organizations discovered that the rapidly growing repositories of data they were collecting (“big data”) could be mined for useful insights.

As with any powerful tool, the more you know about it the better you can apply it and the less likely you will go astray. The lurking dangers are illustrated when you type the phrase “How to lie with...” into a web search engine. The likely autocompletion is “statistics.”

Much of the book that follows deals with important issues that can determine whether data yields meaningful information or not:

  • How to assess the role that random chance can play in creating apparently interesting results or patterns in data
  • How to design experiments and surveys to get useful and reliable information
  • How to formulate simple statistical models to describe relationships between one variable and another

We will start our study in the next ...

Get Statistics for Data Science and Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.