Book Description
Get to grips with pandas—a versatile and highperformance Python library for data manipulation, analysis, and discovery
About This Book
 Get comfortable using pandas and Python as an effective data exploration and analysis tool
 Explore pandas through a framework of data analysis, with an explanation of how pandas is well suited for the various stages in a data analysis process
 A comprehensive guide to pandas with many of clear and practical examples to help you get up and using pandas
Who This Book Is For
This book is ideal for data scientists, data analysts, Python programmers who want to plunge into data analysis using pandas, and anyone with a curiosity about analyzing data. Some knowledge of statistics and programming will be helpful to get the most out of this book but not strictly required. Prior exposure to pandas is also not required.
What You Will Learn
 Understand how data analysts and scientists think about of the processes of gathering and understanding data
 Learn how pandas can be used to support the endtoend process of data analysis
 Use pandas Series and DataFrame objects to represent single and multivariate data
 Slicing and dicing data with pandas, as well as combining, grouping, and aggregating data from multiple sources
 How to access data from external sources such as files, databases, and web services
 Represent and manipulate timeseries data and the many of the intricacies involved with this type of data
 How to visualize statistical information
 How to use pandas to solve several common data representation and analysis problems within finance
In Detail
You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance.
With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Style and approach
 Stepbystep instruction on using pandas within an endtoend framework of performing data analysis
 Practical demonstration of using Python and pandas using interactive and incremental examples
Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.
Publisher Resources
Table of Contents
 Preface

pandas and Data Analysis
 Introducing pandas
 Data manipulation, analysis, science, and pandas
 The process of data analysis
 Relating the book to the process
 Concepts of data and analysis in our tour of pandas
 Other Python libraries of value with pandas
 Summary
 Up and Running with pandas

Representing Univariate Data with the Series
 Configuring pandas
 Creating a Series
 The .index and .values properties
 The size and shape of a Series
 Specifying an index at creation
 Heads, tails, and takes
 Retrieving values in a Series by label or position
 Slicing a Series into subsets
 Alignment via index labels
 Performing Boolean selection
 Reindexing a Series
 Modifying a Series inplace
 Summary
 Representing Tabular and Multivariate Data with the DataFrame

Manipulating DataFrame Structure
 Configuring pandas
 Renaming columns
 Adding new columns with [] and .insert()
 Adding columns through enlargement
 Adding columns using concatenation
 Reordering columns
 Replacing the contents of a column
 Deleting columns
 Appending new rows
 Concatenating rows
 Adding and replacing rows via enlargement
 Removing rows using .drop()
 Removing rows using Boolean selection
 Removing rows using a slice
 Summary

Indexing Data
 Configuring pandas
 The importance of indexes

The pandas index types
 The fundamental type  Index
 Integer index labels using Int64Index and RangeIndex
 Floatingpoint labels using Float64Index
 Representing discrete intervals using IntervalIndex
 Categorical values as an index  CategoricalIndex
 Indexing by date and time using DatetimeIndex
 Indexing periods of time using PeriodIndex
 Working with Indexes
 Hierarchical indexing
 Summary
 Categorical Data

Numerical and Statistical Methods
 Configuring pandas
 Performing numerical methods on pandas objects

Performing statistical processes on pandas objects
 Retrieving summary descriptive statistics
 Measuring central tendency: mean, median, and mode
 Calculating variance and standard deviation
 Determining covariance and correlation
 Performing discretization and quantiling of data
 Calculating the rank of values
 Calculating the percent change at each sample of a series
 Performing movingwindow operations
 Executing random sampling of data
 Summary

Accessing Data
 Configuring pandas

Working with CSV and text/tabular format data
 Examining the sample CSV data set
 Reading a CSV file into a DataFrame
 Specifying the index column when reading a CSV file
 Data type inference and specification
 Specifying column names
 Specifying specific columns to load
 Saving DataFrame to a CSV file
 Working with general fielddelimited data
 Handling variants of formats in fielddelimited data
 Reading and writing data in Excel format
 Reading and writing JSON files
 Reading HTML data from the web
 Reading and writing HDF5 format files
 Accessing CSV data on the web
 Reading and writing from/to SQL databases
 Reading data from remote data services
 Summary
 Tidying Up Your Data
 Combining, Relating, and Reshaping Data
 Data Aggregation

TimeSeries Modelling
 Setting up the IPython notebook
 Representation of dates, time, and intervals
 Introducing timeseries data
 Calculating new dates using offsets
 Representing durations of time using Period
 Handling holidays using calendars
 Normalizing timestamps using time zones
 Manipulating timeseries data
 Timeseries movingwindow operations
 Summary

Visualization
 Configuring pandas
 Plotting basics with pandas
 Creating timeseries charts

Common plots used in statistical analyses
 Showing relative differences with bar plots
 Picturing distributions of data with histograms
 Depicting distributions of categorical data with box and whisker charts
 Demonstrating cumulative totals with area plots
 Relationships between two variables with scatter plots
 Estimates of distribution with the kernel density plot
 Correlations between multiple variables with the scatter plot matrix
 Strengths of relationships in multiple variables with heatmaps
 Manually rendering multiple plots in a single chart
 Summary

Historical Stock Price Analysis
 Setting up the IPython notebook
 Obtaining and organizing stock data from Google
 Plotting timeseries prices
 Plotting volumeseries data
 Calculating the simple daily percentage change in closing price
 Calculating simple daily cumulative returns of a stock
 Resampling data from daily to monthly returns
 Analyzing distribution of returns
 Performing a movingaverage calculation
 Comparison of average daily returns across stocks
 Correlation of stocks based on the daily percentage change of the closing price
 Calculating the volatility of stocks
 Determining risk relative to expected returns
 Summary
Product Information
 Title: Learning pandas  Second Edition
 Author(s):
 Release date: June 2017
 Publisher(s): Packt Publishing
 ISBN: 9781787123137