Chapter 6. Working with Dataframes Using pandas

Data scientists work with data stored in tables. This chapter introduces dataframes, one of the most widely used ways to represent data tables. We also introduce pandas, the standard Python package for working with dataframes. Here is an example of a dataframe that holds information about popular dog breeds:

	grooming	food_cost	kids	size
breed
Labrador Retriever	weekly	466.0	high	medium
German Shepherd	weekly	466.0	medium	large
Beagle	daily	324.0	high	small
Golden Retriever	weekly	466.0	high	medium
Yorkshire Terrier	daily	324.0	low	small
Bulldog	weekly	466.0	medium	medium
Boxer	weekly	466.0	high	medium

In a dataframe, each row represents a single record—in this case, a single dog breed. Each column represents a feature about the record—for example, the grooming column represents how often each dog breed needs to be groomed.

Dataframes have labels for both columns and rows. For instance, this dataframe has a column labeled grooming and a row labeled German Shepherd. The columns and rows of a dataframe are ordered—we can refer to the Labrador Retriever row as the first row of the dataframe.

Within a column, data have the same type. For instance, the cost of food contains numbers, and the size of the dog consists of categories. But data types can be different within a row.

Because of these properties, dataframes enable all sorts of useful operations.

Note

Data scientists often find themselves ...

Get Learning Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning Data Science by Sam Lau, Joseph Gonzalez, Deborah Nolan

Chapter 6. Working with Dataframes Using pandas

Note

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly