CHAPTER 13Analytical Dataset Development Examples

In this chapter, I will walk through the development of datasets for answering different types of analytical questions. This involves combining multiple concepts from previous chapters into more complex queries and therefore is more advanced. Note that the example database doesn't currently contain enough data with correlations for actually doing the analyses that would follow the dataset development, so we won't be looking for trends in the output screenshots. The focus here is on how I would go about designing and building a dataset from our Farmer's Market database using SQL to answer each of the following analytical questions:

  • What factors correlate with fresh produce sales?
  • How do sales vary by customer zip code, market distance, and demographic data?
  • How does product price distribution affect market sales?

What Factors Correlate with Fresh Produce Sales?

Let's say we're asked the analytical question “What factors are correlated with sales of fresh produce at the farmer's market?” So what we're being asked is to determine the relationships between a selection of different variables and a subset of market product sales. That means from a data perspective that we'll need to summarize different variables over periods of time and explore how sales during those same time periods change as each variable changes.

For example, “As the number of different available products at the market increases, do sales of fresh produce ...

Get SQL for Data Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.