Book description
Use the power of pandas to solve most complex scientific computing problems with ease. Revised for pandas 1.x.
Key Features
- This is the first book on pandas 1.x
- Practical, easy to implement recipes for quick solutions to common problems in data using pandas
- Master the fundamentals of pandas to quickly begin exploring any dataset
Book Description
The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands as one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through situations that you are highly likely to encounter.
This new updated and revised edition provides you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. Many advanced recipes combine several different features across the pandas library to generate results.
What you will learn
- Master data exploration in pandas through dozens of practice problems
- Group, aggregate, transform, reshape, and filter data
- Merge data from different sources through pandas SQL-like operations
- Create visualizations via pandas hooks to matplotlib and seaborn
- Use pandas, time series functionality to perform powerful analyses
- Import, clean, and prepare real-world datasets for machine learning
- Create workflows for processing big data that doesn’t fit in memory
Who this book is for
This book is for Python developers, data scientists, engineers, and analysts. Pandas is the ideal tool for manipulating structured data with Python and this book provides ample instruction and examples. Not only does it cover the basics required to be proficient, but it goes into the details of idiomatic pandas.
Table of contents
- Preface
- Pandas Foundations
- Essential DataFrame Operations
- Creating and Persisting DataFrames
- Beginning Data Analysis
- Exploratory Data Analysis
- Selecting Subsets of Data
-
Filtering Rows
- Introduction
- Calculating Boolean statistics
- Constructing multiple Boolean conditions
- Filtering with Boolean arrays
- Comparing row filtering and index filtering
- Selecting with unique and sorted indexes
- Translating SQL WHERE clauses
- Improving the readability of Boolean indexing with the query method
- Preserving Series size with the .where method
- Masking DataFrame rows
- Selecting with Booleans, integer location, and labels
- Index Alignment
-
Grouping for Aggregation, Filtration, and Transformation
- Introduction
- Defining an aggregation
- Grouping and aggregating with multiple columns and functions
- Removing the MultiIndex after grouping
- Grouping with a custom aggregation function
- Customizing aggregating functions with *args and **kwargs
- Examining the groupby object
- Filtering for states with a minority majority
- Transforming through a weight loss bet
- Calculating weighted mean SAT scores per state with apply
- Grouping by continuous variables
- Counting the total number of flights between cities
- Finding the longest streak of on-time flights
-
Restructuring Data into a Tidy Form
- Introduction
- Tidying variable values as column names with stack
- Tidying variable values as column names with melt
- Stacking multiple groups of variables simultaneously
- Inverting stacked data
- Unstacking after a groupby aggregation
- Replicating pivot_table with a groupby aggregation
- Renaming axis levels for easy reshaping
- Tidying when multiple variables are stored as column names
- Tidying when multiple variables are stored as a single column
- Tidying when two or more values are stored in the same cell
- Tidying when variables are stored in column names and values
- Combining Pandas Objects
-
Time Series Analysis
- Introduction
- Understanding the difference between Python and pandas date tools
- Slicing time series intelligently
- Filtering columns with time data
- Using methods that only work with a DatetimeIndex
- Counting the number of weekly crimes
- Aggregating weekly crime and traffic accidents separately
- Measuring crime by weekday and year
- Grouping with anonymous functions with a DatetimeIndex
- Grouping by a Timestamp and another column
-
Visualization with Matplotlib, Pandas, and Seaborn
- Introduction
- Getting started with matplotlib
- Object-oriented guide to matplotlib
- Visualizing data with matplotlib
- Plotting basics with pandas
- Visualizing the flights dataset
- Stacking area charts to discover emerging trends
- Understanding the differences between seaborn and pandas
- Multivariate analysis with seaborn Grids
- Uncovering Simpson's Paradox in the diamonds dataset with seaborn
- Debugging and Testing Pandas
- Other Books You May Enjoy
- Index
Product information
- Title: Pandas 1.x Cookbook - Second Edition
- Author(s):
- Release date: February 2020
- Publisher(s): Packt Publishing
- ISBN: 9781839213106
You might also like
book
Pandas Cookbook
Over 95 hands-on recipes to leverage the power of pandas for efficient scientific computation and data …
book
Python Cookbook, 3rd Edition
If you need help writing programs in Python 3, or want to update older Python 2 …
book
Pandas for Everyone: Python Data Analysis, First Edition
The Hands-On, Example-Rich Introduction to Pandas Data Analysis in Python Today, analysts must manage data characterized …
book
Pandas for Everyone: Python Data Analysis, 2nd Edition
Manage and Automate Data Analysis with Pandas in Python Today, analysts must manage data characterized by …