CHAPTER 7Window Functions and Subqueries

All of the functions that have been covered in this book so far, like ROUND ( ) , return one value in each row of the results dataset. When GROUP BY is used, the functions operate on multiple values in an aggregated group of records, summarizing across multiple rows in the underlying dataset, like AVG() , but each value returned is associated with a single row in the results.

Window functions operate across multiple records, as well, but those records don't have to be grouped in the output. This gives the ability to put the values from one row of data into context compared to a group of rows, or partition, enabling an analyst to write queries that answer questions like: If the dataset were sorted, where would this row land in the results? How does a value in this row compare to a value in the prior row? How does a value in the current row compare to the average value for its group?

So, window functions return group aggregate calculations alongside individual row‐level information for items in that group, or partition. They can also be used to rank or sort values within each partition.

One use for window functions in data science is to include some information from a past record alongside the most recent detail record related to an entity. For example, we could use window functions to get the date of the first purchase a person made at the farmer’s market, to be returned alongside their detailed purchase records, which could then be used ...

Get SQL for Data Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.