Skip to Content
Learning Data Science
book

Learning Data Science

by Sam Lau, Joseph Gonzalez, Deborah Nolan
September 2023
Beginner
596 pages
15h 31m
English
O'Reilly Media, Inc.
Content preview from Learning Data Science

Chapter 6. Working with Dataframes Using pandas

Data scientists work with data stored in tables. This chapter introduces dataframes, one of the most widely used ways to represent data tables. We also introduce pandas, the standard Python package for working with dataframes. Here is an example of a dataframe that holds information about popular dog breeds:

  grooming food_cost kids size
breed        
Labrador Retriever weekly 466.0 high medium
German Shepherd weekly 466.0 medium large
Beagle daily 324.0 high small
Golden Retriever weekly 466.0 high medium
Yorkshire Terrier daily 324.0 low small
Bulldog weekly 466.0 medium medium
Boxer weekly 466.0 high medium

In a dataframe, each row represents a single record—in this case, a single dog breed. Each column represents a feature about the record—for example, the grooming column represents how often each dog breed needs to be groomed.

Dataframes have labels for both columns and rows. For instance, this dataframe has a column labeled grooming and a row labeled German Shepherd. The columns and rows of a dataframe are ordered—we can refer to the Labrador Retriever row as the first row of the dataframe.

Within a column, data have the same type. For instance, the cost of food contains numbers, and the size of the dog consists of categories. But data types can be different within a row.

Because of these properties, dataframes enable all sorts of useful operations.

Note

Data scientists often find themselves ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Dive Into Data Science

Dive Into Data Science

Bradford Tuckfield
Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali

Publisher Resources

ISBN: 9781098112998Errata PageSupplemental Content