Skip to Content
SQL for Data Scientists
book

SQL for Data Scientists

by Renee M. P. Teate
September 2021
Beginner
288 pages
6h 54m
English
Wiley
Content preview from SQL for Data Scientists

CHAPTER 9Exploratory Data Analysis with SQL

Exploratory Data Analysis (EDA) is often discussed in a data science context as a first step in the predictive modeling process, when a data scientist explores what the data in a provided dataset looks like prior to using it to build a predictive model. The SQL we'll be using in this chapter could be used at that point in the process, to explore an already-prepared dataset. But what if you don't have a dataset to work with yet?

Here we'll show examples that could occur even earlier in the data pipeline, as we explore raw data straight from the database tables (as opposed to an already-aggregated dataset in which the raw data has been combined and transformed using SQL that is ready to be ingested into a model). If you are given access to a database for the first time, these are the types of queries you can run to familiarize yourself with the tables and data in it.

There are of course many ways to conduct EDA, including in a Jupyter notebook with Python code, in a Tableau workbook, or using SQL. (I regularly do all three in my job as a data scientist.) In the later EDA, once a dataset has been prepared, the focus is often on distributions of values, relationships between columns, and identifying correlations between input features and the target variable (column with values to be predicted by the model). Here, we will use the types of queries we've covered so far in this book to explore some tables in the Farmer's Market database, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

SQL for Data Analysis

SQL for Data Analysis

Cathy Tanimura
SQL for Data Analytics - Third Edition

SQL for Data Analytics - Third Edition

Jun Shan, Matt Goldwasser, Upom Malik, Benjamin Johnston
Practical SQL

Practical SQL

Anthony DeBarros

Publisher Resources

ISBN: 9781119669364Purchase Link