CHAPTER 6Aggregating Results for Analysis
SQL starts becoming especially powerful for analysis when you use it to aggregate data. By using the
GROUP BY
statement, you can specify the level of summarization and then use aggregate functions to summarize values for the records in each group.
Data analysts can use SQL to build dynamic summary reports that can be automatically updated as the database is updated with new data, by simply triggering a refresh that reruns the query. Dashboards and reports built using software like Tableau and Cognos often rely on SQL queries to get the data they need from the underlying database in an aggregated form that can be used for reporting, which we'll cover in Chapter 10, “Building Analytical Reports with SQL.” Data scientists can use SQL to summarize data at the level of granularity needed for training a classification model, which we'll get into in more depth in Chapter 12, “SQL for Machine Learning.”
But it all starts with basic SQL aggregation.
GROUP BY Syntax
You saw this basic SQL
SELECT
query syntax in Chapter 2, “The SELECT Statement.” Two sections of this query that we haven't yet covered, which are both related to aggregation, are the
GROUP BY
and
HAVING
clauses:
- SELECT [columns to return]
- FROM [table]
- WHERE [conditional filter statements]
- GROUP BY [columns to group on]
- HAVING [conditional filter statements that are run after grouping]
- ORDER BY [columns to sort on]
The
GROUP BY
keywords are followed by a comma-separated list of ...
Get SQL for Data Scientists now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.