Book description
For many researchers, Python is a firstclass tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, ScikitLearn, and other related tools.
Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling daytoday issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the musthave reference for scientific computing in Python.
With this handbook, you’ll learn how to use:
 IPython and Jupyter: provide computational environments for data scientists using Python
 NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python
 Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python
 Matplotlib: includes capabilities for a flexible range of data visualizations in Python
 ScikitLearn: for efficient and clean Python implementations of the most important and established machine learning algorithms
Publisher resources
Table of contents
 Preface
 1. IPython: Beyond Normal Python

2. Introduction to NumPy
 Understanding Data Types in Python
 The Basics of NumPy Arrays
 Computation on NumPy Arrays: Universal Functions
 Aggregations: Min, Max, and Everything in Between
 Computation on Arrays: Broadcasting
 Comparisons, Masks, and Boolean Logic
 Fancy Indexing
 Sorting Arrays
 Structured Data: NumPy’s Structured Arrays

3. Data Manipulation with Pandas
 Installing and Using Pandas
 Introducing Pandas Objects
 Data Indexing and Selection
 Operating on Data in Pandas
 Handling Missing Data
 Hierarchical Indexing
 Combining Datasets: Concat and Append
 Combining Datasets: Merge and Join
 Aggregation and Grouping
 Pivot Tables
 Vectorized String Operations
 Working with Time Series
 HighPerformance Pandas: eval() and query()
 Further Resources

4. Visualization with Matplotlib
 General Matplotlib Tips
 Two Interfaces for the Price of One
 Simple Line Plots
 Simple Scatter Plots
 Visualizing Errors
 Density and Contour Plots
 Histograms, Binnings, and Density
 Customizing Plot Legends
 Customizing Colorbars
 Multiple Subplots
 Text and Annotation
 Customizing Ticks
 Customizing Matplotlib: Configurations and Stylesheets
 ThreeDimensional Plotting in Matplotlib
 Geographic Data with Basemap
 Visualization with Seaborn
 Further Resources

5. Machine Learning
 What Is Machine Learning?
 Introducing ScikitLearn
 Hyperparameters and Model Validation
 Feature Engineering
 In Depth: Naive Bayes Classification
 In Depth: Linear Regression
 InDepth: Support Vector Machines
 InDepth: Decision Trees and Random Forests
 In Depth: Principal Component Analysis
 InDepth: Manifold Learning
 In Depth: kMeans Clustering
 In Depth: Gaussian Mixture Models
 InDepth: Kernel Density Estimation
 Application: A Face Detection Pipeline
 Further Machine Learning Resources
 Index
Product information
 Title: Python Data Science Handbook
 Author(s):
 Release date: November 2016
 Publisher(s): O'Reilly Media, Inc.
 ISBN: 9781491912058
You might also like
book
HandsOn Machine Learning with ScikitLearn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
book
HandsOn Machine Learning with ScikitLearn and TensorFlow
Graphics in this book are printed in black and white. Through a series of recent breakthroughs, …
book
Introduction to Machine Learning with Python
Machine learning has become an integral part of many commercial applications and research projects, but this …
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …