Pandas in Action

Book description

Take the next steps in your data science career! This friendly and hands-on guide shows you how to start mastering Pandas with skills you already know from spreadsheet software.

In Pandas in Action you will learn how to:

  • Import datasets, identify issues with their data structures, and optimize them for efficiency
  • Sort, filter, pivot, and draw conclusions from a dataset and its subsets
  • Identify trends from text-based and time-based data
  • Organize, group, merge, and join separate datasets
  • Use a GroupBy object to store multiple DataFrames

Pandas has rapidly become one of Python's most popular data analysis libraries. In Pandas in Action, a friendly and example-rich introduction, author Boris Paskhaver shows you how to master this versatile tool and take the next steps in your data science career. You’ll learn how easy Pandas makes it to efficiently sort, analyze, filter and munge almost any type of data.

About the Technology
Data analysis with Python doesn’t have to be hard. If you can use a spreadsheet, you can learn pandas! While its grid-style layouts may remind you of Excel, pandas is far more flexible and powerful. This Python library quickly performs operations on millions of rows, and it interfaces easily with other tools in the Python data ecosystem. It’s a perfect way to up your data game.

About the Book
Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you’ll find in the real world.

What's Inside
  • Organize, group, merge, split, and join datasets
  • Find trends in text-based and time-based data
  • Sort, filter, pivot, optimize, and draw conclusions
  • Apply aggregate operations


About the Reader
For readers experienced with spreadsheets and basic Python programming.

About the Author
Boris Paskhaver is a software engineer, Agile consultant, and online educator. His programming courses have been taken by 300,000 students across 190 countries.

Quotes
Of all the introductory pandas books I’ve read—and I did read a few—this is the best, by a mile.
- Erico Lendzian, idibu.com

This approachable guide will get you up and running quickly with all the basics you need to analyze your data.
- Jonathan Sharley, SiriusXM Media

Understanding and putting in practice the concepts of this book will help you increase productivity and make you look like a pro.
- Jose Apablaza, Steadfast Networks

Teaches both novice and expert Python users the essential concepts required for data analysis and data science.
- Ben McNamara, DataGeek

Table of contents

  1. Pandas in Action
  2. Dedication
  3. Copyright
  4. contents
  5. front matter
    1. preface
    2. acknowledgments
    3. about this book
      1. Who should read this book
      2. How this book is organized: A road map
      3. About the code
      4. liveBook discussion forum
      5. Other online resources
    4. about the author
    5. about the cover illustration
  6. Part 1. Core pandas
  7. 1 Introducing pandas
    1. 1.1 Data in the 21st century
    2. 1.2 Introducing pandas
      1. 1.2.1 Pandas vs. graphical spreadsheet applications
      2. 1.2.2 Pandas vs. its competitors
    3. 1.3 A tour of pandas
      1. 1.3.1 Importing a data set
      2. 1.3.2 Manipulating a DataFrame
      3. 1.3.3 Counting values in a Series
      4. 1.3.4 Filtering a column by one or more criteria
      5. 1.3.5 Grouping data
    4. Summary
  8. 2 The Series object
    1. 2.1 Overview of a Series
      1. 2.1.1 Classes and instances
      2. 2.1.2 Populating the Series with values
      3. 2.1.3 Customizing the Series index
      4. 2.1.4 Creating a Series with missing values
    2. 2.2 Creating a Series from Python objects
    3. 2.3 Series attributes
    4. 2.4 Retrieving the first and last rows
    5. 2.5 Mathematical operations
      1. 2.5.1 Statistical operations
      2. 2.5.2 Arithmetic operations
      3. 2.5.3 Broadcasting
    6. 2.6 Passing the Series to Python’s built-in functions
    7. 2.7 Coding challenge
      1. 2.7.1 Problems
      2. 2.7.2 Solutions
    8. Summary
  9. 3 Series methods
    1. 3.1 Importing a data set with the read_csv function
    2. 3.2 Sorting a Series
      1. 3.2.1 Sorting by values with the sort_values method
      2. 3.2.2 Sorting by index with the sort_index method
      3. 3.2.3 Retrieving the smallest and largest values with the nsmallest and nlargest methods
    3. 3.3 Overwriting a Series with the inplace parameter
    4. 3.4 Counting values with the value_counts method
    5. 3.5 Invoking a function on every Series value with the apply method
    6. 3.6 Coding challenge
      1. 3.6.1 Problems
      2. 3.6.2 Solutions
    7. Summary
  10. 4 The DataFrame object
    1. 4.1 Overview of a DataFrame
      1. 4.1.1 Creating a DataFrame from a dictionary
      2. 4.1.2 Creating a DataFrame from a NumPy ndarray
    2. 4.2 Similarities between Series and DataFrames
      1. 4.2.1 Importing a DataFrame with the read_csv function
      2. 4.2.2 Shared and exclusive attributes of Series and DataFrames
      3. 4.2.3 Shared methods of Series and DataFrames
    3. 4.3 Sorting a DataFrame
      1. 4.3.1 Sorting by a single column
      2. 4.3.2 Sorting by multiple columns
    4. 4.4 Sorting by index
      1. 4.4.1 Sorting by row index
      2. 4.4.2 Sorting by column index
    5. 4.5 Setting a new index
    6. 4.6 Selecting columns and rows from a DataFrame
      1. 4.6.1 Selecting a single column from a DataFrame
      2. 4.6.2 Selecting multiple columns from a DataFrame
    7. 4.7 Selecting rows from a DataFrame
      1. 4.7.1 Extracting rows by index label
      2. 4.7.2 Extracting rows by index position
      3. 4.7.3 Extracting values from specific columns
    8. 4.8 Extracting values from Series
    9. 4.9 Renaming columns or rows
    10. 4.10 Resetting an index
    11. 4.11 Coding challenge
      1. 4.11.1 Problems
      2. 4.11.2 Solutions
    12. Summary
  11. 5 Filtering a DataFrame
    1. 5.1 Optimizing a data set for memory use
      1. 5.1.1 Converting data types with the astype method
    2. 5.2 Filtering by a single condition
    3. 5.3 Filtering by multiple conditions
      1. 5.3.1 The AND condition
      2. 5.3.2 The OR condition
      3. 5.3.3 Inversion with ~
      4. 5.3.4 Methods for Booleans
    4. 5.4 Filtering by condition
      1. 5.4.1 The isin method
      2. 5.4.2 The between method
      3. 5.4.3 The isnull and notnull methods
      4. 5.4.4 Dealing with null values
    5. 5.5 Dealing with duplicates
      1. 5.5.1 The duplicated method
      2. 5.5.2 The drop_duplicates method
    6. 5.6 Coding challenge
      1. 5.6.1 Problems
      2. 5.6.2 Solutions
    7. Summary
  12. Part 2. Applied pandas
  13. 6 Working with text data
    1. 6.1 Letter casing and whitespace
    2. 6.2 String slicing
    3. 6.3 String slicing and character replacement
    4. 6.4 Boolean methods
    5. 6.5 Splitting strings
    6. 6.6 Coding challenge
      1. 6.6.1 Problems
      2. 6.6.2 Solutions
    7. 6.7 A note on regular expressions
    8. Summary
  14. 7 MultiIndex DataFrames
    1. 7.1 The MultiIndex object
    2. 7.2 MultiIndex DataFrames
    3. 7.3 Sorting a MultiIndex
    4. 7.4 Selecting with a MultiIndex
      1. 7.4.1 Extracting one or more columns
      2. 7.4.2 Extracting one or more rows with loc
      3. 7.4.3 Extracting one or more rows with iloc
    5. 7.5 Cross-sections
    6. 7.6 Manipulating the Index
      1. 7.6.1 Resetting the index
      2. 7.6.2 Setting the index
    7. 7.7 Coding challenge
      1. 7.7.1 Problems
      2. 7.7.2 Solutions
    8. Summary
  15. 8 Reshaping and pivoting
    1. 8.1 Wide vs. narrow data
    2. 8.2 Creating a pivot table from a DataFrame
      1. 8.2.1 The pivot_table method
      2. 8.2.2 Additional options for pivot tables
    3. 8.3 Stacking and unstacking index levels
    4. 8.4 Melting a data set
    5. 8.5 Exploding a list of values
    6. 8.6 Coding challenge
      1. 8.6.1 Problems
      2. 8.6.2 Solutions
    7. Summary
  16. 9 The GroupBy object
    1. 9.1 Creating a GroupBy object from scratch
    2. 9.2 Creating a GroupBy object from a data set
    3. 9.3 Attributes and methods of a GroupBy object
    4. 9.4 Aggregate operations
    5. 9.5 Applying a custom operation to all groups
    6. 9.6 Grouping by multiple columns
    7. 9.7 Coding challenge
      1. 9.7.1 Problems
      2. 9.7.2 Solutions
    8. Summary
  17. 10 Merging, joining, and concatenating
    1. 10.1 Introducing the data sets
    2. 10.2 Concatenating data sets
    3. 10.3 Missing values in concatenated DataFrames
    4. 10.4 Left joins
    5. 10.5 Inner joins
    6. 10.6 Outer joins
    7. 10.7 Merging on index labels
    8. 10.8 Coding challenge
      1. 10.8.1 Problems
      2. 10.8.2 Solutions
    9. Summary
  18. 11 Working with dates and times
    1. 11.1 Introducing the Timestamp object
      1. 11.1.1 How Python works with datetimes
      2. 11.1.2 How pandas works with datetimes
    2. 11.2 Storing multiple timestamps in a DatetimeIndex
    3. 11.3 Converting column or index values to datetimes
    4. 11.4 Using the DatetimeProperties object
    5. 11.5 Adding and subtracting durations of time
    6. 11.6 Date offsets
    7. 11.7 The Timedelta object
    8. 11.8 Coding challenge
      1. 11.8.1 Problems
      2. 11.8.2 Solutions
    9. Summary
  19. 12 Imports and exports
    1. 12.1 Reading from and writing to JSON files
      1. 12.1.1 Loading a JSON file Into a DataFrame
      2. 12.1.2 Exporting a DataFrame to a JSON file
    2. 12.2 Reading from and writing to CSV files
    3. 12.3 Reading from and writing to Excel workbooks
      1. 12.3.1 Installing the xlrd and openpyxl libraries in an Anaconda environment
      2. 12.3.2 Importing Excel workbooks
      3. 12.3.3 Exporting Excel workbooks
    4. 12.4 Coding challenge
      1. 12.4.1 Problems
      2. 12.4.2 Solutions
    5. Summary
  20. 13 Configuring pandas
    1. 13.1 Getting and setting pandas options
    2. 13.2 Precision
    3. 13.3 Maximum column width
    4. 13.4 Chop threshold
    5. 13.5 Option context
    6. Summary
  21. 14 Visualization
    1. 14.1 Installing matplotlib
    2. 14.2 Line charts
    3. 14.3 Bar graphs
    4. 14.4 Pie charts
    5. Summary
  22. Appendix A. Installation and setup
    1. A.1 The Anaconda distribution
    2. A.2 The macOS setup process
      1. A.2.1 Installing Anaconda in macOS
      2. A.2.2 Launching Terminal
      3. A.2.3 Common Terminal commands
    3. A.3 The Windows setup process
      1. A.3.1 Installing Anaconda in Windows
      2. A.3.2 Launching Anaconda Prompt
      3. A.3.3 Common Anaconda Prompt commands
    4. A.4 Creating a new Anaconda environment
    5. A.5 Anaconda Navigator
    6. A.6 The basics of Jupyter Notebook
  23. Appendix B. Python crash course
    1. B.1 Simple data types
      1. B.1.1 Numbers
      2. B.1.2 Strings
      3. B.1.3 Booleans
      4. B.1.4 The None object
    2. B.2 Operators
      1. B.2.1 Mathematical operators
      2. B.2.2 Equality and inequality operators
    3. B.3 Variables
    4. B.4 Functions
      1. B.4.1 Arguments and return values
      2. B.4.2 Custom functions
    5. B.5 Modules
    6. B.6 Classes and objects
    7. B.7 Attributes and methods
    8. B.8 String methods
    9. B.9 Lists
      1. B.9.1 List iteration
      2. B.9.2 List comprehension
      3. B.9.3 Converting a string to a list and vice versa
    10. B.10 Tuples
    11. B.11 Dictionaries
      1. B.11.1 Dictionary Iteration
    12. B.12 Sets
  24. Appendix C. NumPy crash course
    1. C.1 Dimensions
    2. C.2 The ndarray object
      1. C.2.1 Generating a numeric range with the arange method
      2. C.2.2 Attributes on a ndarray object
      3. C.2.3 The reshape method
      4. C.2.4 The randint function
      5. C.2.5 The randn function
    3. C.3 The nan object
  25. Appendix D. Generating fake data with Faker
    1. D.1 Installing Faker
    2. D.2 Getting started with Faker
    3. D.3 Populating a DataFrame with fake values
  26. Appendix E. Regular expressions
    1. E.1 Introduction to Python’s re module
    2. E.2 Metacharacters
    3. E.3 Advanced search patterns
    4. E.4 Regular expressions and pandas
  27. index

Product information

  • Title: Pandas in Action
  • Author(s): Boris Paskhaver
  • Release date: September 2021
  • Publisher(s): Manning Publications
  • ISBN: 9781617297434