Skip to Content
Learning Data Science
book

Learning Data Science

by Sam Lau, Joseph Gonzalez, Deborah Nolan
September 2023
Beginner
596 pages
15h 31m
English
O'Reilly Media, Inc.
Content preview from Learning Data Science

Chapter 7. Working with Relations Using SQL

In Chapter 6, we used dataframes to represent tables of data. This chapter introduces relations, another widely used way to represent data tables. We also introduce SQL, the standard programming language for working with relations. Here’s an example of a relation that holds information about popular dog breeds.

Like dataframes, each row in a relation represents a single record—in this case, a single dog breed. Each column represents a feature about the record—for example, the grooming column represents how often each dog breed needs to be groomed.

Both relations and dataframes have labels for each column in the table. However, one key difference is that the rows in a relation don’t have labels, while rows in a dataframe do.

In this chapter, we demonstrate common relation operations using SQL. We start by explaining the structure of SQL queries. Then we show how to use SQL to perform common data manipulation tasks, like slicing, filtering, sorting, grouping, and joining.

Note

This chapter replicates the data analyses in Chapter 6 using relations and SQL instead of dataframes and Python. The datasets, data manipulations, and conclusions are nearly identical across the two chapters for ease of comparison between performing data manipulations in pandas and SQL.

Subsetting

To work with relations, we’ll introduce a domain-specific programming language called SQL (Structured Query Language). We commonly pronounce “SQL” like “sequel” instead ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Dive Into Data Science

Dive Into Data Science

Bradford Tuckfield
Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali

Publisher Resources

ISBN: 9781098112998Errata PageSupplemental Content