Pandas 1.x Cookbook - Second Edition

Book description

Use the power of pandas to solve most complex scientific computing problems with ease. Revised for pandas 1.x.

Key Features

  • This is the first book on pandas 1.x
  • Practical, easy to implement recipes for quick solutions to common problems in data using pandas
  • Master the fundamentals of pandas to quickly begin exploring any dataset

Book Description

The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands as one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through situations that you are highly likely to encounter.

This new updated and revised edition provides you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. Many advanced recipes combine several different features across the pandas library to generate results.

What you will learn

  • Master data exploration in pandas through dozens of practice problems
  • Group, aggregate, transform, reshape, and filter data
  • Merge data from different sources through pandas SQL-like operations
  • Create visualizations via pandas hooks to matplotlib and seaborn
  • Use pandas, time series functionality to perform powerful analyses
  • Import, clean, and prepare real-world datasets for machine learning
  • Create workflows for processing big data that doesn’t fit in memory

Who this book is for

This book is for Python developers, data scientists, engineers, and analysts. Pandas is the ideal tool for manipulating structured data with Python and this book provides ample instruction and examples. Not only does it cover the basics required to be proficient, but it goes into the details of idiomatic pandas.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Running a Jupyter Notebook
      3. Download the color images
      4. Conventions
    4. Get in touch
      1. Reviews
  2. Pandas Foundations
    1. Importing pandas
    2. Introduction
    3. The pandas DataFrame
      1. How it works…
    4. DataFrame attributes
      1. How to do it…
      2. How it works…
      3. There's more...
    5. Understanding data types
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Selecting a column
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Calling Series methods
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Series operations
      1. How to do it…
      2. How it works…
      3. There's more…
    9. Chaining Series methods
      1. How to do it…
      2. How it works…
      3. There's more…
    10. Renaming column names
      1. How to do it…
      2. How it works…
      3. There's more…
    11. Creating and deleting columns
      1. How to do it…
      2. How it works…
      3. There's more…
  3. Essential DataFrame Operations
    1. Introduction
    2. Selecting multiple DataFrame columns
      1. How to do it...
      2. How it works...
      3. There's more...
    3. Selecting columns with methods
      1. How to do it...
      2. How it works...
      3. There's more...
    4. Ordering column names
      1. How to do it...
      2. How it works...
      3. There's more...
    5. Summarizing a DataFrame
      1. How to do it...
      2. How it works...
      3. There's more...
    6. Chaining DataFrame methods
      1. How to do it...
      2. How it works...
      3. There's more...
    7. DataFrame operations
      1. How to do it...
      2. How it works...
      3. There's more...
    8. Comparing missing values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    9. Transposing the direction of a DataFrame operation
      1. How to do it...
      2. How it works...
      3. There's more...
    10. Determining college campus diversity
      1. How to do it...
      2. How it works...
      3. There's more...
  4. Creating and Persisting DataFrames
    1. Introduction
    2. Creating DataFrames from scratch
      1. How to do it...
      2. How it works...
      3. There's more...
    3. Writing CSV
      1. How to do it...
      2. There's more...
    4. Reading large CSV files
      1. How to do it...
      2. How it works...
      3. There's more...
    5. Using Excel files
      1. How to do it...
      2. How it works...
      3. There's more...
    6. Working with ZIP files
      1. How to do it...
      2. How it works...
      3. There's more...
    7. Working with databases
      1. How to do it...
      2. How it works...
    8. Reading JSON
      1. How to do it...
      2. How it works...
      3. There's more...
    9. Reading HTML tables
      1. How to do it...
      2. How it works...
      3. There's more...
  5. Beginning Data Analysis
    1. Introduction
    2. Developing a data analysis routine
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Data dictionaries
    4. Reducing memory by changing data types
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Selecting the smallest of the largest
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Selecting the largest of each group by sorting
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Replicating nlargest with sort_values
      1. How to do it…
      2. How it works…
    8. Calculating a trailing stop order price
      1. How to do it…
      2. How it works…
      3. There's more…
  6. Exploratory Data Analysis
    1. Introduction
    2. Summary statistics
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Column types
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Categorical data
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Continuous data
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Comparing continuous values across categories
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Comparing two continuous columns
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Comparing categorical values with categorical values
      1. How to do it…
      2. How it works…
    9. Using the pandas profiling library
      1. How to do it…
      2. How it works…
  7. Selecting Subsets of Data
    1. Introduction
    2. Selecting Series data
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Selecting DataFrame rows
      1. How it works…
      2. There's more…
    4. Selecting DataFrame rows and columns simultaneously
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Selecting data with both integers and labels
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Slicing lexicographically
      1. How to do it…
      2. How it works…
      3. There's more…
  8. Filtering Rows
    1. Introduction
    2. Calculating Boolean statistics
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Constructing multiple Boolean conditions
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Filtering with Boolean arrays
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Comparing row filtering and index filtering
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Selecting with unique and sorted indexes
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Translating SQL WHERE clauses
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Improving the readability of Boolean indexing with the query method
      1. How to do it…
      2. How it works…
      3. There's more…
    9. Preserving Series size with the .where method
      1. How to do it…
      2. How it works…
      3. There's more…
    10. Masking DataFrame rows
      1. How to do it…
      2. How it works…
      3. There's more…
    11. Selecting with Booleans, integer location, and labels
      1. How to do it…
      2. How it works…
  9. Index Alignment
    1. Introduction
    2. Examining the Index object
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Producing Cartesian products
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Exploding indexes
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Filling values with unequal indexes
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Adding columns from different DataFrames
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Highlighting the maximum value from each column
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Replicating idxmax with method chaining
      1. How to do it…
      2. How it works…
      3. There's more…
    9. Finding the most common maximum of columns
      1. How to do it…
      2. How it works…
      3. There's more…
  10. Grouping for Aggregation, Filtration, and Transformation
    1. Introduction
    2. Defining an aggregation
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Grouping and aggregating with multiple columns and functions
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Removing the MultiIndex after grouping
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Grouping with a custom aggregation function
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Customizing aggregating functions with *args and **kwargs
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Examining the groupby object
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Filtering for states with a minority majority
      1. How to do it…
      2. How it works…
      3. There's more…
    9. Transforming through a weight loss bet
      1. How to do it…
      2. How it works…
      3. There's more…
    10. Calculating weighted mean SAT scores per state with apply
      1. How to do it…
      2. How it works…
      3. There's more…
    11. Grouping by continuous variables
      1. How to do it…
      2. How it works…
      3. There's more…
    12. Counting the total number of flights between cities
      1. How to do it…
      2. How it works…
      3. There's more…
    13. Finding the longest streak of on-time flights
      1. How to do it…
      2. How it works…
      3. There's more…
  11. Restructuring Data into a Tidy Form
    1. Introduction
    2. Tidying variable values as column names with stack
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Tidying variable values as column names with melt
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Stacking multiple groups of variables simultaneously
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Inverting stacked data
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Unstacking after a groupby aggregation
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Replicating pivot_table with a groupby aggregation
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Renaming axis levels for easy reshaping
      1. How to do it…
      2. How it works…
      3. There's more…
    9. Tidying when multiple variables are stored as column names
      1. How to do it…
      2. How it works…
      3. There's more…
    10. Tidying when multiple variables are stored as a single column
      1. How to do it…
      2. How it works…
      3. There's more…
    11. Tidying when two or more values are stored in the same cell
      1. How to do it...
      2. How it works…
      3. There's more…
    12. Tidying when variables are stored in column names and values
      1. How to do it…
      2. How it works…
      3. There's more…
  12. Combining Pandas Objects
    1. Introduction
    2. Appending new rows to DataFrames
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Concatenating multiple DataFrames together
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Understanding the differences between concat, join, and merge
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Connecting to SQL databases
      1. How to do it…
      2. How it works…
      3. There's more…
  13. Time Series Analysis
    1. Introduction
    2. Understanding the difference between Python and pandas date tools
      1. How to do it…
      2. How it works…
    3. Slicing time series intelligently
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Filtering columns with time data
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Using methods that only work with a DatetimeIndex
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Counting the number of weekly crimes
      1. How to do it…
      2. How it works…
      3. There's more…
    7. Aggregating weekly crime and traffic accidents separately
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Measuring crime by weekday and year
      1. How to do it…
      2. How it works…
      3. There's more…
    9. Grouping with anonymous functions with a DatetimeIndex
      1. How to do it…
      2. How it works…
    10. Grouping by a Timestamp and another column
      1. How to do it…
      2. How it works…
      3. There's more…
  14. Visualization with Matplotlib, Pandas, and Seaborn
    1. Introduction
    2. Getting started with matplotlib
    3. Object-oriented guide to matplotlib
      1. How to do it…
      2. How it works…
      3. There's more…
    4. Visualizing data with matplotlib
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Plotting basics with pandas
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Visualizing the flights dataset
      1. How to do it…
      2. How it works…
    7. Stacking area charts to discover emerging trends
      1. How to do it…
      2. How it works…
    8. Understanding the differences between seaborn and pandas
      1. How to do it…
      2. How it works…
    9. Multivariate analysis with seaborn Grids
      1. How to do it…
      2. How it works…
      3. There's more…
    10. Uncovering Simpson's Paradox in the diamonds dataset with seaborn
      1. How to do it…
      2. How it works…
      3. There's more…
  15. Debugging and Testing Pandas
    1. Code to transform data
      1. How to do it…
      2. How it works…
    2. Apply performance
      1. How to do it…
      2. How it works…
      3. There's more…
    3. Improving apply performance with Dask, Pandarell, Swifter, and more
      1. How to do it…
      2. How it works…
    4. Inspecting code
      1. How to do it…
      2. How it works…
      3. There's more…
    5. Debugging in Jupyter
      1. How to do it…
      2. How it works…
      3. There's more…
    6. Managing data integrity with Great Expectations
      1. How to do it…
      2. How it works…
    7. Using pytest with pandas
      1. How to do it…
      2. How it works…
      3. There's more…
    8. Generating tests with Hypothesis
      1. How to do it…
      2. How it works…
  16. Other Books You May Enjoy
  17. Index

Product information

  • Title: Pandas 1.x Cookbook - Second Edition
  • Author(s): Matt Harrison, Theodore Petrou
  • Release date: February 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781839213106