Skip to Content
SQL for Data Analysis
book

SQL for Data Analysis

by Cathy Tanimura
September 2021
Beginner to intermediate
357 pages
9h 53m
English
O'Reilly Media, Inc.
Book available
Content preview from SQL for Data Analysis

Chapter 8. Creating Complex Data Sets for Analysis

In Chapters 3 through 7, we looked at a number of ways in which SQL can be used to perform analysis on data in databases. In addition to these specific use cases, sometimes the goal of a query is to assemble a data set that is specific yet general-purpose enough that it can be used to perform a variety of further analyses. The destination might be a database table, a text file, or a business intelligence tool. The SQL that is needed might be simple, requiring only a few filters or aggregations. Often, however, the code or logic needed to achieve the desired data set can become very complex. Additionally, such code is likely to be updated over time, as stakeholders request additional data points or calculations. The organization, performance, and maintainability of your SQL code become critical in a way that isn’t the case for one-time analyses.

In this chapter, I’ll discuss principles for organizing code so that it’s easier to share and update. Then I’ll discuss when to keep query logic in the SQL and when to consider moving to permanent tables via ETL (extract-transform-load) code. Next, I’ll explain the options for storing intermediate results—subqueries, temp tables, and common table expressions (CTEs)—and considerations for using them in your code. Finally, I’ll wrap up with a look at techniques for reducing data set size and ideas for handling data privacy and removing personally identifiable information (PII).

When to Use ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

SQL for Data Analytics

SQL for Data Analytics

Upom Malik, Matt Goldwasser, Benjamin Johnston
Analytics Engineering with SQL and dbt

Analytics Engineering with SQL and dbt

Rui Pedro Machado, Helder Russa

Publisher Resources

ISBN: 9781492088776Errata PageSupplemental Content