Chapter 14: Exploratory Data Analysis

In Chapter 13, Using Machine Learning without Premium or Embedded Capacity, we mentioned that using Auto Machine Learning (AutoML) solutions on a dataset blindly often does not lead to very accurate models. This is because it is necessary to understand the most inherent characteristics of the dataset by using statistical tools at an earlier stage to extract useful information in order to get a better model.

The approach to be used for this type of dataset analysis is called Exploratory Data Analysis (EDA) and was first introduced by John Turkey to encourage statisticians to explore data and formulate hypotheses that would lead to new data collection and experiments to eventually enrich patterns among the ...

Get Extending Power BI with Python and R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.