IN THIS CHAPTER
Understanding the exploratory data analysis (EDA) philosophy
Describing numeric and categorical distributions
Estimating correlation and association
Testing mean differences in groups
Visualizing distributions, relationships, and groups
“If you torture the data long enough, it will confess.”
— RONALD COASE
Data science relies on complex algorithms for building predictions and spotting important signals in data, and each algorithm presents different strong and weak points. In short, you select a range of algorithms, you have them run on the data, you optimize their parameters as much as you can, and finally you decide which one will best help you build your data product or generate insight into your problem.
It sounds a little bit automatic and, partially, it is, thanks to powerful analytical software and scripting languages like Python. Learning algorithms are complex, and their sophisticated procedures naturally seem automatic and ...