Pandas in Action

Book description

Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you’ll find in the real world.

Table of contents

  1. Pandas in Action
  2. Dedication
  3. Copyright
  4. contents
  5. front matter
    1. preface
    2. acknowledgments
    3. about this book
      1. Who should read this book
      2. How this book is organized: A road map
      3. About the code
      4. liveBook discussion forum
      5. Other online resources
    4. about the author
    5. about the cover illustration
  6. Part 1. Core pandas
  7. 1 Introducing pandas
    1. 1.1 Data in the 21st century
    2. 1.2 Introducing pandas
      1. 1.2.1 Pandas vs. graphical spreadsheet applications
      2. 1.2.2 Pandas vs. its competitors
    3. 1.3 A tour of pandas
      1. 1.3.1 Importing a data set
      2. 1.3.2 Manipulating a DataFrame
      3. 1.3.3 Counting values in a Series
      4. 1.3.4 Filtering a column by one or more criteria
      5. 1.3.5 Grouping data
    4. Summary
  8. 2 The Series object
    1. 2.1 Overview of a Series
      1. 2.1.1 Classes and instances
      2. 2.1.2 Populating the Series with values
      3. 2.1.3 Customizing the Series index
      4. 2.1.4 Creating a Series with missing values
    2. 2.2 Creating a Series from Python objects
    3. 2.3 Series attributes
    4. 2.4 Retrieving the first and last rows
    5. 2.5 Mathematical operations
      1. 2.5.1 Statistical operations
      2. 2.5.2 Arithmetic operations
      3. 2.5.3 Broadcasting
    6. 2.6 Passing the Series to Python’s built-in functions
    7. 2.7 Coding challenge
      1. 2.7.1 Problems
      2. 2.7.2 Solutions
    8. Summary
  9. 3 Series methods
    1. 3.1 Importing a data set with the read_csv function
    2. 3.2 Sorting a Series
      1. 3.2.1 Sorting by values with the sort_values method
      2. 3.2.2 Sorting by index with the sort_index method
      3. 3.2.3 Retrieving the smallest and largest values with the nsmallest and nlargest methods
    3. 3.3 Overwriting a Series with the inplace parameter
    4. 3.4 Counting values with the value_counts method
    5. 3.5 Invoking a function on every Series value with the apply method
    6. 3.6 Coding challenge
      1. 3.6.1 Problems
      2. 3.6.2 Solutions
    7. Summary
  10. 4 The DataFrame object
    1. 4.1 Overview of a DataFrame
      1. 4.1.1 Creating a DataFrame from a dictionary
      2. 4.1.2 Creating a DataFrame from a NumPy ndarray
    2. 4.2 Similarities between Series and DataFrames
      1. 4.2.1 Importing a DataFrame with the read_csv function
      2. 4.2.2 Shared and exclusive attributes of Series and DataFrames
      3. 4.2.3 Shared methods of Series and DataFrames
    3. 4.3 Sorting a DataFrame
      1. 4.3.1 Sorting by a single column
      2. 4.3.2 Sorting by multiple columns
    4. 4.4 Sorting by index
      1. 4.4.1 Sorting by row index
      2. 4.4.2 Sorting by column index
    5. 4.5 Setting a new index
    6. 4.6 Selecting columns and rows from a DataFrame
      1. 4.6.1 Selecting a single column from a DataFrame
      2. 4.6.2 Selecting multiple columns from a DataFrame
    7. 4.7 Selecting rows from a DataFrame
      1. 4.7.1 Extracting rows by index label
      2. 4.7.2 Extracting rows by index position
      3. 4.7.3 Extracting values from specific columns
    8. 4.8 Extracting values from Series
    9. 4.9 Renaming columns or rows
    10. 4.10 Resetting an index
    11. 4.11 Coding challenge
      1. 4.11.1 Problems
      2. 4.11.2 Solutions
    12. Summary
  11. 5 Filtering a DataFrame
    1. 5.1 Optimizing a data set for memory use
      1. 5.1.1 Converting data types with the astype method
    2. 5.2 Filtering by a single condition
    3. 5.3 Filtering by multiple conditions
      1. 5.3.1 The AND condition
      2. 5.3.2 The OR condition
      3. 5.3.3 Inversion with ~
      4. 5.3.4 Methods for Booleans
    4. 5.4 Filtering by condition
      1. 5.4.1 The isin method
      2. 5.4.2 The between method
      3. 5.4.3 The isnull and notnull methods
      4. 5.4.4 Dealing with null values
    5. 5.5 Dealing with duplicates
      1. 5.5.1 The duplicated method
      2. 5.5.2 The drop_duplicates method
    6. 5.6 Coding challenge
      1. 5.6.1 Problems
      2. 5.6.2 Solutions
    7. Summary
  12. Part 2. Applied pandas
  13. 6 Working with text data
    1. 6.1 Letter casing and whitespace
    2. 6.2 String slicing
    3. 6.3 String slicing and character replacement
    4. 6.4 Boolean methods
    5. 6.5 Splitting strings
    6. 6.6 Coding challenge
      1. 6.6.1 Problems
      2. 6.6.2 Solutions
    7. 6.7 A note on regular expressions
    8. Summary
  14. 7 MultiIndex DataFrames
    1. 7.1 The MultiIndex object
    2. 7.2 MultiIndex DataFrames
    3. 7.3 Sorting a MultiIndex
    4. 7.4 Selecting with a MultiIndex
      1. 7.4.1 Extracting one or more columns
      2. 7.4.2 Extracting one or more rows with loc
      3. 7.4.3 Extracting one or more rows with iloc
    5. 7.5 Cross-sections
    6. 7.6 Manipulating the Index
      1. 7.6.1 Resetting the index
      2. 7.6.2 Setting the index
    7. 7.7 Coding challenge
      1. 7.7.1 Problems
      2. 7.7.2 Solutions
    8. Summary
  15. 8 Reshaping and pivoting
    1. 8.1 Wide vs. narrow data
    2. 8.2 Creating a pivot table from a DataFrame
      1. 8.2.1 The pivot_table method
      2. 8.2.2 Additional options for pivot tables
    3. 8.3 Stacking and unstacking index levels
    4. 8.4 Melting a data set
    5. 8.5 Exploding a list of values
    6. 8.6 Coding challenge
      1. 8.6.1 Problems
      2. 8.6.2 Solutions
    7. Summary
  16. 9 The GroupBy object
    1. 9.1 Creating a GroupBy object from scratch
    2. 9.2 Creating a GroupBy object from a data set
    3. 9.3 Attributes and methods of a GroupBy object
    4. 9.4 Aggregate operations
    5. 9.5 Applying a custom operation to all groups
    6. 9.6 Grouping by multiple columns
    7. 9.7 Coding challenge
      1. 9.7.1 Problems
      2. 9.7.2 Solutions
    8. Summary
  17. 10 Merging, joining, and concatenating
    1. 10.1 Introducing the data sets
    2. 10.2 Concatenating data sets
    3. 10.3 Missing values in concatenated DataFrames
    4. 10.4 Left joins
    5. 10.5 Inner joins
    6. 10.6 Outer joins
    7. 10.7 Merging on index labels
    8. 10.8 Coding challenge
      1. 10.8.1 Problems
      2. 10.8.2 Solutions
    9. Summary
  18. 11 Working with dates and times
    1. 11.1 Introducing the Timestamp object
      1. 11.1.1 How Python works with datetimes
      2. 11.1.2 How pandas works with datetimes
    2. 11.2 Storing multiple timestamps in a DatetimeIndex
    3. 11.3 Converting column or index values to datetimes
    4. 11.4 Using the DatetimeProperties object
    5. 11.5 Adding and subtracting durations of time
    6. 11.6 Date offsets
    7. 11.7 The Timedelta object
    8. 11.8 Coding challenge
      1. 11.8.1 Problems
      2. 11.8.2 Solutions
    9. Summary
  19. 12 Imports and exports
    1. 12.1 Reading from and writing to JSON files
      1. 12.1.1 Loading a JSON file Into a DataFrame
      2. 12.1.2 Exporting a DataFrame to a JSON file
    2. 12.2 Reading from and writing to CSV files
    3. 12.3 Reading from and writing to Excel workbooks
      1. 12.3.1 Installing the xlrd and openpyxl libraries in an Anaconda environment
      2. 12.3.2 Importing Excel workbooks
      3. 12.3.3 Exporting Excel workbooks
    4. 12.4 Coding challenge
      1. 12.4.1 Problems
      2. 12.4.2 Solutions
    5. Summary
  20. 13 Configuring pandas
    1. 13.1 Getting and setting pandas options
    2. 13.2 Precision
    3. 13.3 Maximum column width
    4. 13.4 Chop threshold
    5. 13.5 Option context
    6. Summary
  21. 14 Visualization
    1. 14.1 Installing matplotlib
    2. 14.2 Line charts
    3. 14.3 Bar graphs
    4. 14.4 Pie charts
    5. Summary
  22. Appendix A. Installation and setup
    1. A.1 The Anaconda distribution
    2. A.2 The macOS setup process
      1. A.2.1 Installing Anaconda in macOS
      2. A.2.2 Launching Terminal
      3. A.2.3 Common Terminal commands
    3. A.3 The Windows setup process
      1. A.3.1 Installing Anaconda in Windows
      2. A.3.2 Launching Anaconda Prompt
      3. A.3.3 Common Anaconda Prompt commands
    4. A.4 Creating a new Anaconda environment
    5. A.5 Anaconda Navigator
    6. A.6 The basics of Jupyter Notebook
  23. Appendix B. Python crash course
    1. B.1 Simple data types
      1. B.1.1 Numbers
      2. B.1.2 Strings
      3. B.1.3 Booleans
      4. B.1.4 The None object
    2. B.2 Operators
      1. B.2.1 Mathematical operators
      2. B.2.2 Equality and inequality operators
    3. B.3 Variables
    4. B.4 Functions
      1. B.4.1 Arguments and return values
      2. B.4.2 Custom functions
    5. B.5 Modules
    6. B.6 Classes and objects
    7. B.7 Attributes and methods
    8. B.8 String methods
    9. B.9 Lists
      1. B.9.1 List iteration
      2. B.9.2 List comprehension
      3. B.9.3 Converting a string to a list and vice versa
    10. B.10 Tuples
    11. B.11 Dictionaries
      1. B.11.1 Dictionary Iteration
    12. B.12 Sets
  24. Appendix C. NumPy crash course
    1. C.1 Dimensions
    2. C.2 The ndarray object
      1. C.2.1 Generating a numeric range with the arange method
      2. C.2.2 Attributes on a ndarray object
      3. C.2.3 The reshape method
      4. C.2.4 The randint function
      5. C.2.5 The randn function
    3. C.3 The nan object
  25. Appendix D. Generating fake data with Faker
    1. D.1 Installing Faker
    2. D.2 Getting started with Faker
    3. D.3 Populating a DataFrame with fake values
  26. Appendix E. Regular expressions
    1. E.1 Introduction to Python’s re module
    2. E.2 Metacharacters
    3. E.3 Advanced search patterns
    4. E.4 Regular expressions and pandas
  27. index

Product information

  • Title: Pandas in Action
  • Author(s): Boris Paskhaver
  • Release date: September 2021
  • Publisher(s): Manning Publications
  • ISBN: 9781617297434