Learning pandas

Book description

Get to grips with pandas - a versatile and high-performance Python library for data manipulation, analysis, and discovery

In Detail

This learner's guide will help you understand how to use the features of pandas for interactive data manipulation and analysis.

This book is your ideal guide to learning about pandas, all the way from installing it to creating one- and two-dimensional indexed data structures, indexing and slicing-and-dicing that data to derive results, loading data from local and Internet-based resources, and finally creating effective visualizations to form quick insights. You start with an overview of pandas and NumPy and then dive into the details of pandas, covering pandas' Series and DataFrame objects, before ending with a quick review of using pandas for several problems in finance.

With the knowledge you gain from this book, you will be able to quickly begin your journey into the exciting world of data science and analysis.

What You Will Learn

  • Install pandas on Windows, Mac, and Linux using the Anaconda Python distribution
  • Learn how pandas builds on NumPy to implement flexible indexed data
  • Adopt pandas' Series and DataFrame objects to represent one- and two-dimensional data constructs
  • Index, slice, and transform data to derive meaning from information
  • Load data from files, databases, and web services
  • Manipulate dates, times, and time series data
  • Group, aggregate, and summarize data
  • Visualize techniques for pandas and statistical data

Table of contents

  1. Learning pandas
    1. Table of Contents
    2. Learning pandas
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. A Tour of pandas
      1. pandas and why it is important
      2. pandas and IPython Notebooks
      3. Referencing pandas in the application
      4. Primary pandas objects
        1. The pandas Series object
        2. The pandas DataFrame object
      5. Loading data from files and the Web
        1. Loading CSV data from files
        2. Loading data from the Web
      6. Simplicity of visualization of pandas data
      7. Summary
    9. 2. Installing pandas
      1. Getting Anaconda
      2. Installing Anaconda
        1. Installing Anaconda on Linux
        2. Installing Anaconda on Mac OS X
        3. Installing Anaconda on Windows
      3. Ensuring pandas is up to date
      4. Running a small pandas sample in IPython
      5. Starting the IPython Notebook server
      6. Installing and running IPython Notebooks
      7. Using Wakari for pandas
      8. Summary
    10. 3. NumPy for pandas
      1. Installing and importing NumPy
      2. Benefits and characteristics of NumPy arrays
      3. Creating NumPy arrays and performing basic array operations
      4. Selecting array elements
      5. Logical operations on arrays
      6. Slicing arrays
      7. Reshaping arrays
      8. Combining arrays
      9. Splitting arrays
      10. Useful numerical methods of NumPy arrays
      11. Summary
    11. 4. The pandas Series Object
      1. The Series object
      2. Importing pandas
      3. Creating Series
      4. Size, shape, uniqueness, and counts of values
      5. Peeking at data with heads, tails, and take
      6. Looking up values in Series
        1. Alignment via index labels
      7. Arithmetic operations
      8. The special case of Not-A-Number (NaN)
      9. Boolean selection
      10. Reindexing a Series
        1. Modifying a Series in-place
      11. Slicing a Series
      12. Summary
    12. 5. The pandas DataFrame Object
      1. Creating DataFrame from scratch
      2. Example data
        1. S&P 500
        2. Monthly stock historical prices
      3. Selecting columns of a DataFrame
      4. Selecting rows and values of a DataFrame using the index
        1. Slicing using the [] operator
        2. Selecting rows by index label and location: .loc[] and .iloc[]
        3. Selecting rows by index label and/or location: .ix[]
        4. Scalar lookup by label or location using .at[] and .iat[]
      5. Selecting rows of a DataFrame by Boolean selection
      6. Modifying the structure and content of DataFrame
        1. Renaming columns
        2. Adding and inserting columns
        3. Replacing the contents of a column
        4. Deleting columns in a DataFrame
        5. Adding rows to a DataFrame
          1. Appending rows with .append()
          2. Concatenating DataFrame objects with pd.concat()
          3. Adding rows (and columns) via setting with enlargement
        6. Removing rows from a DataFrame
          1. Removing rows using .drop()
          2. Removing rows using Boolean selection
          3. Removing rows using a slice
        7. Changing scalar values in a DataFrame
      7. Arithmetic on a DataFrame
      8. Resetting and reindexing
      9. Hierarchical indexing
      10. Summarized data and descriptive statistics
      11. Summary
    13. 6. Accessing Data
      1. Setting up the IPython notebook
        1. CSV and Text/Tabular format
          1. The sample CSV data set
          2. Reading a CSV file into a DataFrame
          3. Specifying the index column when reading a CSV file
          4. Data type inference and specification
          5. Specifying column names
          6. Specifying specific columns to load
          7. Saving DataFrame to a CSV file
        2. General field-delimited data
          1. Handling noise rows in field-delimited data
        3. Reading and writing data in an Excel format
      2. Reading and writing JSON files
        1. Reading HTML data from the Web
        2. Reading and writing HDF5 format files
      3. Accessing data on the web and in the cloud
      4. Reading and writing from/to SQL databases
      5. Reading data from remote data services
        1. Reading stock data from Yahoo! and Google Finance
          1. Retrieving data from Yahoo! Finance Options
          2. Reading economic data from the Federal Reserve Bank of St. Louis
          3. Accessing Kenneth French's data
          4. Reading from the World Bank
      6. Summary
    14. 7. Tidying Up Your Data
      1. What is tidying your data?
      2. Setting up the IPython notebook
      3. Working with missing data
        1. Determining NaN values in Series and DataFrame objects
        2. Selecting out or dropping missing data
        3. How pandas handles NaN values in mathematical operations
        4. Filling in missing data
        5. Forward and backward filling of missing values
        6. Filling using index labels
        7. Interpolation of missing values
      4. Handling duplicate data
      5. Transforming Data
        1. Mapping
        2. Replacing values
        3. Applying functions to transform data
      6. Summary
    15. 8. Combining and Reshaping Data
      1. Setting up the IPython notebook
      2. Concatenating data
      3. Merging and joining data
        1. An overview of merges
        2. Specifying the join semantics of a merge operation
        3. Pivoting
      4. Stacking and unstacking
        1. Stacking using nonhierarchical indexes
        2. Unstacking using hierarchical indexes
        3. Melting
      5. Performance benefits of stacked data
      6. Summary
    16. 9. Grouping and Aggregating Data
      1. Setting up the IPython notebook
      2. The split, apply, and combine (SAC) pattern
      3. Split
        1. Data for the examples
        2. Grouping by a single column's values
        3. Accessing the results of grouping
        4. Grouping using index levels
      4. Apply
        1. Applying aggregation functions to groups
        2. The transformation of group data
          1. An overview of transformation
          2. Practical examples of transformation
        3. Filtering groups
      5. Discretization and Binning
      6. Summary
    17. 10. Time-series Data
      1. Setting up the IPython notebook
      2. Representation of dates, time, and intervals
        1. The datetime, day, and time objects
        2. Timestamp objects
        3. Timedelta
      3. Introducing time-series data
        1. DatetimeIndex
        2. Creating time-series data with specific frequencies
      4. Calculating new dates using offsets
        1. Date offsets
        2. Anchored offsets
        3. Representing durations of time using Period objects
        4. The Period object
        5. PeriodIndex
      5. Handling holidays using calendars
      6. Normalizing timestamps using time zones
      7. Manipulating time-series data
        1. Shifting and lagging
        2. Frequency conversion
        3. Up and down resampling
        4. Time-series moving-window operations
      8. Summary
    18. 11. Visualization
      1. Setting up the IPython notebook
      2. Plotting basics with pandas
        1. Creating time-series charts with .plot()
        2. Adorning and styling your time-series plot
          1. Adding a title and changing axes labels
          2. Specifying the legend content and position
          3. Specifying line colors, styles, thickness, and markers
          4. Specifying tick mark locations and tick labels
          5. Formatting axes tick date labels using formatters
      3. Common plots used in statistical analyses
        1. Bar plots
        2. Histograms
        3. Box and whisker charts
        4. Area plots
        5. Scatter plots
        6. Density plot
        7. The scatter plot matrix
        8. Heatmaps
      4. Multiple plots in a single chart
      5. Summary
    19. 12. Applications to Finance
      1. Setting up the IPython notebook
      2. Obtaining and organizing stock data from Yahoo!
      3. Plotting time-series prices
        1. Plotting volume-series data
        2. Calculating the simple daily percentage change
        3. Calculating simple daily cumulative returns
        4. Resampling data from daily to monthly returns
        5. Analyzing distribution of returns
      4. Performing a moving-average calculation
        1. The comparison of average daily returns across stocks
        2. The correlation of stocks based on the daily percentage change of the closing price
      5. Volatility calculation
      6. Determining risk relative to expected returns
      7. Summary
    20. Index

Product information

  • Title: Learning pandas
  • Author(s): Michael Heydt
  • Release date: April 2015
  • Publisher(s): Packt Publishing
  • ISBN: 9781783985128