CHAPTER 10Building SQL Datasets for Analytical Reporting

In previous chapters, we covered basic SQL SELECT syntax and started to use SQL to construct datasets to answer specific questions. In the data analysis world, being asked questions, exploring a database, writing SQL statements to find and pull the data needed to determine the answers, and conducting the analysis of that data to calculate the answers to the questions, is called ad-hoc reporting.

I often say in my data science conference presentations that the process depicted in Figure 10.1 is what is expected of any data analyst or data scientist: to be able to listen to a question from a business stakeholder, determine how it might be answered using data from the database, retrieve the data needed to answer it, calculate the answers, and present that result in a form that the business stakeholder can understand and use to make decisions.

Schematic illustration of business questions and the corresponding answer.

Figure 10.1

You now know enough SQL to answer some basic ad-hoc questions about what is occurring at the fictional farmer’s market using the demonstration database by filtering, joining, and summarizing the data.

In the remaining chapters, we'll take those skills to the next level and demonstrate how to think through multiple analysis questions, simulating what it might be like to write queries to answer a question posed by a business stakeholder. We'll design and develop analytical ...

Get SQL for Data Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.