Book description
Get to grips with pandas—a versatile and high-performance Python library for data manipulation, analysis, and discovery
About This Book
Get comfortable using pandas and Python as an effective data exploration and analysis tool
Explore pandas through a framework of data analysis, with an explanation of how pandas is well suited for the various stages in a data analysis process
A comprehensive guide to pandas with many of clear and practical examples to help you get up and using pandas
Who This Book Is For
This book is ideal for data scientists, data analysts, Python programmers who want to plunge into data analysis using pandas, and anyone with a curiosity about analyzing data. Some knowledge of statistics and programming will be helpful to get the most out of this book but not strictly required. Prior exposure to pandas is also not required.
What You Will Learn
Understand how data analysts and scientists think about of the processes of gathering and understanding data
Learn how pandas can be used to support the end-to-end process of data analysis
Use pandas Series and DataFrame objects to represent single and multivariate data
Slicing and dicing data with pandas, as well as combining, grouping, and aggregating data from multiple sources
How to access data from external sources such as files, databases, and web services
Represent and manipulate time-series data and the many of the intricacies involved with this type of data
How to visualize statistical information
How to use pandas to solve several common data representation and analysis problems within finance
In Detail
You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance.
With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Style and approach
Step-by-step instruction on using pandas within an end-to-end framework of performing data analysis
Practical demonstration of using Python and pandas using interactive and incremental examples
Table of contents
- Preface
-
pandas and Data Analysis
- Introducing pandas
- Data manipulation, analysis, science, and pandas
- The process of data analysis
- Relating the book to the process
- Concepts of data and analysis in our tour of pandas
- Other Python libraries of value with pandas
- Summary
- Up and Running with pandas
-
Representing Univariate Data with the Series
- Configuring pandas
- Creating a Series
- The .index and .values properties
- The size and shape of a Series
- Specifying an index at creation
- Heads, tails, and takes
- Retrieving values in a Series by label or position
- Slicing a Series into subsets
- Alignment via index labels
- Performing Boolean selection
- Re-indexing a Series
- Modifying a Series in-place
- Summary
- Representing Tabular and Multivariate Data with the DataFrame
-
Manipulating DataFrame Structure
- Configuring pandas
- Renaming columns
- Adding new columns with [] and .insert()
- Adding columns through enlargement
- Adding columns using concatenation
- Reordering columns
- Replacing the contents of a column
- Deleting columns
- Appending new rows
- Concatenating rows
- Adding and replacing rows via enlargement
- Removing rows using .drop()
- Removing rows using Boolean selection
- Removing rows using a slice
- Summary
-
Indexing Data
- Configuring pandas
- The importance of indexes
-
The pandas index types
- The fundamental type - Index
- Integer index labels using Int64Index and RangeIndex
- Floating-point labels using Float64Index
- Representing discrete intervals using IntervalIndex
- Categorical values as an index - CategoricalIndex
- Indexing by date and time using DatetimeIndex
- Indexing periods of time using PeriodIndex
- Working with Indexes
- Hierarchical indexing
- Summary
- Categorical Data
-
Numerical and Statistical Methods
- Configuring pandas
- Performing numerical methods on pandas objects
-
Performing statistical processes on pandas objects
- Retrieving summary descriptive statistics
- Measuring central tendency: mean, median, and mode
- Calculating variance and standard deviation
- Determining covariance and correlation
- Performing discretization and quantiling of data
- Calculating the rank of values
- Calculating the percent change at each sample of a series
- Performing moving-window operations
- Executing random sampling of data
- Summary
-
Accessing Data
- Configuring pandas
-
Working with CSV and text/tabular format data
- Examining the sample CSV data set
- Reading a CSV file into a DataFrame
- Specifying the index column when reading a CSV file
- Data type inference and specification
- Specifying column names
- Specifying specific columns to load
- Saving DataFrame to a CSV file
- Working with general field-delimited data
- Handling variants of formats in field-delimited data
- Reading and writing data in Excel format
- Reading and writing JSON files
- Reading HTML data from the web
- Reading and writing HDF5 format files
- Accessing CSV data on the web
- Reading and writing from/to SQL databases
- Reading data from remote data services
- Summary
- Tidying Up Your Data
- Combining, Relating, and Reshaping Data
- Data Aggregation
-
Time-Series Modelling
- Setting up the IPython notebook
- Representation of dates, time, and intervals
- Introducing time-series data
- Calculating new dates using offsets
- Representing durations of time using Period
- Handling holidays using calendars
- Normalizing timestamps using time zones
- Manipulating time-series data
- Time-series moving-window operations
- Summary
-
Visualization
- Configuring pandas
- Plotting basics with pandas
- Creating time-series charts
-
Common plots used in statistical analyses
- Showing relative differences with bar plots
- Picturing distributions of data with histograms
- Depicting distributions of categorical data with box and whisker charts
- Demonstrating cumulative totals with area plots
- Relationships between two variables with scatter plots
- Estimates of distribution with the kernel density plot
- Correlations between multiple variables with the scatter plot matrix
- Strengths of relationships in multiple variables with heatmaps
- Manually rendering multiple plots in a single chart
- Summary
-
Historical Stock Price Analysis
- Setting up the IPython notebook
- Obtaining and organizing stock data from Google
- Plotting time-series prices
- Plotting volume-series data
- Calculating the simple daily percentage change in closing price
- Calculating simple daily cumulative returns of a stock
- Resampling data from daily to monthly returns
- Analyzing distribution of returns
- Performing a moving-average calculation
- Comparison of average daily returns across stocks
- Correlation of stocks based on the daily percentage change of the closing price
- Calculating the volatility of stocks
- Determining risk relative to expected returns
- Summary
Product information
- Title: Learning pandas - Second Edition
- Author(s):
- Release date: June 2017
- Publisher(s): Packt Publishing
- ISBN: 9781787123137
You might also like
book
Mastering pandas - Second Edition
Perform advanced data manipulation tasks using pandas and become an expert data analyst. Key Features Manipulate …
video
Programming with Data: Python and Pandas LiveLessons
5 Hours of Video Instruction Learn how to use Pandas and Python to load and transform …
audiobook
Pandas in Action
Of all the introductory pandas books I’ve read—and I did read a few—this is the best, …
book
Pandas in Action
Take the next steps in your data science career! This friendly and hands-on guide shows you …