O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mastering Exploratory Analysis with pandas

Book Description

Explore Python frameworks like pandas, Jupyter notebooks, and Matplotlib to build data pipelines and data visualization

Key Features

  • Learn to set up data analysis pipelines with pandas and Jupyter notebooks
  • Effective techniques for data selection, manipulation, and visualization
  • Introduction to Matplotlib for interactive data visualization using charts and plots

Book Description

The pandas is a Python library that lets you manipulate, transform, and analyze data. It is a popular framework for exploratory data visualization and analyzing datasets and data pipelines based on their properties.

This book will be your practical guide to exploring datasets using pandas. You will start by setting up Python, pandas, and Jupyter Notebooks. You will learn how to use Jupyter Notebooks to run Python code. We then show you how to get data into pandas and do some exploratory analysis, before learning how to manipulate and reshape data using pandas methods. You will also learn how to deal with missing data from your datasets, how to draw charts and plots using pandas and Matplotlib, and how to create some effective visualizations for your audience. Finally, you will wrapup your newly gained pandas knowledge by learning how to import data out of pandas into some popular file formats.

By the end of this book, you will have a better understanding of exploratory analysis and how to build exploratory data pipelines with Python.

What you will learn

  • Learn how to read different kinds of data into pandas DataFrames for data analysis
  • Manipulate, transform, and apply formulas to data imported into pandas DataFrames
  • Use pandas to analyze and visualize different kinds of data to gain real-world insights
  • Extract transformed data form pandas DataFrames and convert it into the formats your application expects
  • Manipulate model time-series data, perform algorithmic trading, derive results on fixed and moving windows, and more
  • Effective data visualization using Matplotlib

Who this book is for

If you are a budding data scientist looking to learn the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Mastering Exploratory Analysis with pandas
  3. Packt Upsell
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the author
    2. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Working with Different Kinds of Datasets
    1. Using advanced options while reading data from CSV files
      1. Importing modules
      2. Advanced read options
        1. Manipulating columns, index locations, and names
      3. Specifying a different row as a header
      4. Specifying a column as an index
      5. Choosing a subset of columns to be read
      6. Handling missing and NA data
      7. Choosing whether to skip over blank rows
        1. Data parsing options
      8. Skipping rows from the footer or end of the file
      9. Reading the subset of a file or a certain number of rows
    2. Reading data from Excel files
      1. Basic Excel read
      2. Specifying which sheet should be read
      3. Reading data from multiple sheets
        1. Finding out sheet names
        2. Choosing header or column labels
        3. No header
        4. Skipping rows at the beginning
        5. Skipping rows at the end
        6. Choosing columns
        7. Column names
        8. Setting an index while reading data
    3. Handling missing data while reading
    4. Reading data from other popular formats
      1. Reading a JSON file
      2. Reading JSON data into pandas
      3. Reading HTML data
      4. Reading a PICKLE file
      5. Reading SQL data
      6. Reading data from the clipboard
    5. Summary
  7. Data Selection
    1. Introduction to datasets
    2. Selecting data from the dataset
      1. Multi-column selection
      2. Dot notation
      3. Selecting multiple rows and columns from a pandas DataFrame
      4. Selecting a single row and multiple columns
      5. Selecting values from a range of rows and all columns
    3. Sorting a pandas DataFrame
    4. Filtering rows of a pandas DataFrame
    5. Applying multiple filter criteria to a pandas DataFrame
      1. Filtering based on multiple conditions – AND
      2. Filtering based on multiple conditions – OR
      3. Filtering using the isin method
        1. Using the isin method with multiple conditions
    6. Using the axis parameter in pandas
      1. Usage of the axis parameter
        1. Axis usage examples
        2. More examples of the axis keyword
      2. The axis keyword
    7. Using string methods in pandas
      1. Checking for a substring
      2. Changing the values of a series or column into uppercase
        1. Changing the values into lowercase
        2. Finding the length of every value of a column
        3. Removing white spaces
      3. Replacing parts of a column's values
    8. Changing the datatype of a pandas series
      1. Changing an int datatype column to a float
      2. Changing the datatype while reading data
      3. Converting string to datetime
    9. Summary
  8. Manipulating, Transforming, and Reshaping Data
    1. Modifying a pandas DataFrame using the inplace parameter
    2. Using the groupby method
    3. Handling missing values in pandas
    4. Indexing in pandas DataFrames
    5. Renaming columns in a pandas DataFrame
    6. Removing columns from a pandas DataFrame
    7. Working with date and time series data
    8. Handling SettingWithCopyWarning
    9. Applying a function to a pandas series or DataFrame
    10. Merging and concatenating multiple DataFrames into one
    11. Summary
  9. Visualizing Data Like a Pro
    1. Controlling plot aesthetics
      1. Our first plot with seaborn
      2. Changing the plot style with set_style
        1. Setting the plot background to a white grid
        2. Setting the plot background to dark
        3. Setting the background to white
        4. Adding ticks
      3. Customizing styles
        1. Style parameters
        2. Plotting context presets
    2. Choosing the colors for plots
      1. Changing the color palette
      2. Building custom color palettes
    3. Plotting categorical data
      1. Scatterplot
      2. Swarm plot
      3. Box plot
      4. Violin plot
      5. Bar plot
      6. Wide-form plot
    4. Plotting with Data-Aware Grids
      1. Plotting with the FacetGrid() method
      2. Plotting with the PairGrid() method 
      3. Plotting with the PairPlot() method 
    5. Summary
  10. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think