Now that you have learned how to select the data you want from a database table and filter to the rows you want, you might wonder what to do if the data you need exists across multiple related tables in the database. For example, one analytical question mentioned in Chapter 1, “When is each type of fresh fruit or vegetable in season, locally?” requires data from the product_category table (to filter to the categories with fresh fruit and vegetables), the product table (to get details about each specific item, including product names and quantity types), and the vendor_inventory table (to find out when vendors were selling these products). This is where SQL JOINs come in.

Database Relationships and SQL JOINs

In Chapter 1, “Data Sources,” we introduced different types of database relationships and the entity-relationship diagram (ERD). The type of relationship between database tables, and the key fields that connect them, give us information we need to combine them using a JOIN statement in SQL.

Let's say we wanted to list each product name along with its product category name. Since only the ID of the product category exists in the product table, and the product category's name is in the product_category table, we have to combine the data in the product and product_category tables together in order to generate this list.

Figure 5.1 shows the one-to-many relationship between these two tables: each product can only belong to one category, but each category ...

Get SQL for Data Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.