Chapter 1. Exploratory Data Analysis
The thesis of this book is that data combined with practical methods can answer questions and guide decisions under uncertainty.
As an example, I present a case study motivated by a question I heard when my wife and I were expecting our first child: do first babies tend to arrive late?
If you Google this question, you will find plenty of discussion. Some people claim it’s true, others say it’s a myth, and some people say it’s the other way around: first babies come early.
In many of these discussions, people provide data to support their claims. I found many examples like these:
“My two friends that have given birth recently to their first babies, BOTH went almost 2 weeks overdue before going into labour or being induced.”
“My first one came 2 weeks late and now I think the second one is going to come out two weeks early!!”
“I don’t think that can be true because my sister was my mother’s first and she was early, as with many of my cousins.”
Reports like these are called anecdotal evidence because they are based on data that is unpublished and usually personal. In casual conversation, there is nothing wrong with anecdotes, so I don’t mean to pick on the people I quoted.
But we might want evidence that is more persuasive and an answer that is more reliable. By those standards, anecdotal evidence usually fails, because:
- Small number of observations
If pregnancy length is longer for first babies, the difference is probably small compared to natural variation. ...