Chapter 5

Exploring Data Analysis


check Understanding the exploratory data analysis (EDA) philosophy

check Describing numeric and categorical distributions

check Estimating correlation and association

check Testing mean differences in groups

check Visualizing distributions, relationships, and groups

“If you torture the data long enough, it will confess.”


Data science relies on complex algorithms for building predictions and spotting important signals in data, and each algorithm presents different strong and weak points. In short, you select a range of algorithms, you have them run on the data, you optimize their parameters as much as you can, and finally you decide which one will best help you build your data product or generate insight into your problem.

It sounds a little bit automatic and, partially, it is, thanks to powerful analytical software and scripting languages like Python. Learning algorithms are complex, and their sophisticated procedures naturally seem automatic and ...

Get Coding All-in-One For Dummies now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.