O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Pandas Cookbook

Book Description

Over 95 hands-on recipes to leverage the power of pandas for efficient scientific computation and data analysis

About This Book

  • Use the power of pandas to solve most complex scientific computing problems with ease
  • Leverage fast, robust data structures in pandas to gain useful insights from your data
  • Practical, easy to implement recipes for quick solutions to common problems in data using pandas

Who This Book Is For

This book is for data scientists, analysts and Python developers who wish to explore data analysis and scientific computing in a practical, hands-on manner. The recipes included in this book are suitable for both novice and advanced users, and contain helpful tips, tricks and caveats wherever necessary. Some understanding of pandas will be helpful, but not mandatory.

What You Will Learn

  • Master the fundamentals of pandas to quickly begin exploring any dataset
  • Isolate any subset of data by properly selecting and querying the data
  • Split data into independent groups before applying aggregations and transformations to each group
  • Restructure data into tidy form to make data analysis and visualization easier
  • Prepare real-world messy datasets for machine learning
  • Combine and merge data from different sources through pandas SQL-like operations
  • Utilize pandas unparalleled time series functionality
  • Create beautiful and insightful visualizations through pandas direct hooks to Matplotlib and Seaborn

In Detail

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way.

The pandas library is massive, and it’s common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter.

Many advanced recipes combine several different features across the pandas library to generate results.

Style and approach

The author relies on his vast experience teaching pandas in a professional setting to deliver very detailed explanations for each line of code in all of the recipes. All code and dataset explanations exist in Jupyter Notebooks, an excellent interface for exploring data.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
      1. Running a Jupyter Notebook
    3. Who this book is for
      1. How to get the most out of this book
    4. Conventions
    5. Assumptions for every recipe
    6. Dataset Descriptions
    7. Sections
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Reader feedback
    9. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Pandas Foundations
    1. Introduction
    2. Dissecting the anatomy of a DataFrame
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Accessing the main DataFrame components
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Understanding data types
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Selecting a single column of data as a Series
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Calling Series methods
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Working with operators on a Series
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Chaining Series methods together
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Making the index meaningful
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Renaming row and column names
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    11. Creating and deleting columns
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  3. Essential DataFrame Operations
    1. Introduction
    2. Selecting multiple DataFrame columns
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Selecting columns with methods
      1. Getting ready
      2. How it works...
      3. How it works...
      4. There's more...
      5. See also
    4. Ordering column names sensibly
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Operating on the entire DataFrame
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Chaining DataFrame methods together
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Working with operators on a DataFrame
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Comparing missing values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Transposing the direction of a DataFrame operation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Determining college campus diversity
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  4. Beginning Data Analysis
    1. Introduction
    2. Developing a data analysis routine
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
        1. Data dictionaries
      5. See also
    3. Reducing memory by changing data types
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Selecting the smallest of the largest
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Selecting the largest of each group by sorting
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Replicating nlargest with sort_values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Calculating a trailing stop order price
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  5. Selecting Subsets of Data
    1. Introduction
    2. Selecting Series data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Selecting DataFrame rows
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Selecting DataFrame rows and columns simultaneously
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Selecting data with both integers and labels
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Speeding up scalar selection
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Slicing rows lazily
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    8. Slicing lexicographically
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
  6. Boolean Indexing
    1. Introduction
    2. Calculating boolean statistics
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Constructing multiple boolean conditions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Filtering with boolean indexing
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Replicating boolean indexing with index selection
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Selecting with unique and sorted indexes
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Gaining perspective on stock prices
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Translating SQL WHERE clauses
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    9. Determining the normality of stock market returns
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Improving readability of boolean indexing with the query method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    11. Preserving Series with the where method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    12. Masking DataFrame rows
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    13. Selecting with booleans, integer location, and labels
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  7. Index Alignment
    1. Introduction
    2. Examining the Index object
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Producing Cartesian products
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Exploding indexes
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Filling values with unequal indexes
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Appending columns from different DataFrames
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Highlighting the maximum value from each column
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Replicating idxmax with method chaining
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Finding the most common maximum
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
  8. Grouping for Aggregation, Filtration, and Transformation
    1. Introduction
    2. Defining an aggregation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Grouping and aggregating with multiple columns and functions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Removing the MultiIndex after grouping
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Customizing an aggregation function
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Customizing aggregating functions with *args and **kwargs
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Examining the groupby object
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Filtering for states with a minority majority
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    9. Transforming through a weight loss bet
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Calculating weighted mean SAT scores per state with apply
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    11. Grouping by continuous variables
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    12. Counting the total number of flights between cities
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    13. Finding the longest streak of on-time flights
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  9. Restructuring Data into a Tidy Form
    1. Introduction
    2. Tidying variable values as column names with stack
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Tidying variable values as column names with melt
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Stacking multiple groups of variables simultaneously
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Inverting stacked data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Unstacking after a groupby aggregation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Replicating pivot_table with a groupby aggregation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    8. Renaming axis levels for easy reshaping
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Tidying when multiple variables are stored as column names
      1. Getting ready...
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Tidying when multiple variables are stored as column values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    11. Tidying when two or more values are stored in the same cell
      1. Getting ready...
      2. How to do it..
      3. How it works...
      4. There's more...
    12. Tidying when variables are stored in column names and values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    13. Tidying when multiple observational units are stored in the same table
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  10. Combining Pandas Objects
    1. Introduction
    2. Appending new rows to DataFrames
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Concatenating multiple DataFrames together
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Comparing President Trump's and Obama's approval ratings
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Understanding the differences between concat, join, and merge
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Connecting to SQL databases
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  11. Time Series Analysis
    1. Introduction
    2. Understanding the difference between Python and pandas date tools
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Slicing time series intelligently
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Using methods that only work with a DatetimeIndex
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Counting the number of weekly crimes
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Aggregating weekly crime and traffic accidents separately
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Measuring crime by weekday and year
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Grouping with anonymous functions with a DatetimeIndex
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    9. Grouping by a Timestamp and another column
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    10. Finding the last time crime was 20% lower with merge_asof
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
  12. Visualization with Matplotlib, Pandas, and Seaborn
    1. Introduction
    2. Getting started with matplotlib
      1. Getting ready
        1. Object-oriented guide to matplotlib
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Visualizing data with matplotlib
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Plotting basics with pandas
      1. Getting ready
      2. How to do it..
      3. How it works...
      4. There's more...
      5. See also
    5. Visualizing the flights dataset
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    6. Stacking area charts to discover emerging trends
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Understanding the differences between seaborn and pandas
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. See also
    8. Doing multivariate analysis with seaborn Grids
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Uncovering Simpson's paradox in the diamonds dataset with seaborn
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...