book

Mastering Python Data Analysis

Name: Mastering Python Data Analysis
ISBN: 9781783553297

by Magnus Vilhelm Persson, Luiz Felipe Martins

June 2016

Beginner to intermediate

284 pages

6h 22m

English

Packt Publishing

Read now

Unlock full access

Mastering Python Data Analysis
Mastering Python Data Analysis
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for

Conventions
Reader feedback
Customer support
Downloading the example codeDownloading the color images of this bookErrataPiracyQuestions
1. Tools of the Trade
Before you start
Using the notebook interface
Imports
An example using the Pandas library
Summary
2. Exploring Data
The General Social SurveyObtaining the dataReading the data
Univariate data
HistogramsMaking things prettyCharacterizationConcept of statistical inferenceNumeric summaries and boxplots
Relationships between variables – scatterplots
Summary
3. Learning About Models
Models and experiments
The cumulative distribution function
Working with distributions
The probability density function
Where do models come from?
Multivariate distributions
Summary
4. Regression
Introducing linear regressionGetting the datasetTesting with linear regression
Multivariate regression
Adding economic indicatorsTaking a step back
Logistic regression
Some notes
Summary
5. Clustering
Introduction to cluster findingStarting out simple – John Snow on cholera
K-means clustering
Suicide rate versus GDP versus absolute latitude
Hierarchical clustering analysis
Reading in and reducing the dataHierarchical cluster algorithm
Summary
6. Bayesian Methods
The Bayesian methodCredible versus confidence intervalsBayes formulaPython packages
U.S. air travel safety record
Getting the NTSB databaseBinning the dataBayesian analysis of the dataBinning by monthPlotting coordinatesCartopyMpl toolkits – basemap
Climate change - CO2 in the atmosphere
Getting the dataCreating and sampling the model
Summary
7. Supervised and Unsupervised Learning
Introduction to machine learning
Scikit-learn
Linear regression
Climate dataChecking with Bayesian analysis and OLS
Clustering
Seeds classification
Visualizing the dataFeature selectionClassifying the dataThe SVC linear kernelThe SVC Radial Basis FunctionThe SVC polynomialK-Nearest NeighbourRandom ForestChoosing your classifier
Summary
8. Time Series Analysis
Introduction
Pandas and time series data
Indexing and slicing
Resampling, smoothing, and other estimates
Stationarity
Patterns and components
Decomposing componentsDifferencing
Time series models
Autoregressive – ARMoving average – MASelecting p and qAutomatic functionThe (Partial) AutoCorrelation FunctionAutoregressive Integrated Moving Average – ARIMA
Summary
A. More on Jupyter Notebook and matplotlib Styles
Jupyter NotebookUseful keyboard shortcutsCommand mode shortcutsEdit mode shortcutsMarkdown cellsNotebook Python extensionsInstalling the extensionsCodefoldingCollapsible headingsHelp panelInitialization cellsNbExtensions menu itemRulerSkip-tracebackTable of contentsOther Jupyter Notebook tipsExternal connectionsExportAdditional file types
Matplotlib styles
Useful resources
General resourcesPackagesData repositoriesVisualization of data
Summary

Content preview from Mastering Python Data Analysis

Chapter 5. Clustering

With data comprising of several separated distributions, how do we find and characterize them? In this chapter, we will look at some ways to identify clusters in data. Groups of points with similar characteristics form clusters. There are many different algorithms and methods to achieve this with good and bad points. We want to detect multiple separate distributions in the data and determine the degree of association (or similarity) with another point or cluster for each point. The degree of association needs to be high if they belong in a cluster together or low if they do not. This can of course, just as previously, be a one-dimensional problem or multi-dimensional problem. One of the inherent difficulties of cluster finding ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics

Gayathri Rajagopalan

Publisher Resources

ISBN: 9781783553297

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Mastering Python Data Analysis

by Magnus Vilhelm Persson, Luiz Felipe Martins

Chapter 5. Clustering

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.