book

Mastering Python Data Analysis

Name: Mastering Python Data Analysis
ISBN: 9781783553297

by Magnus Vilhelm Persson, Luiz Felipe Martins

June 2016

Beginner to intermediate

284 pages

6h 22m

English

Packt Publishing

Read now

Unlock full access

Mastering Python Data Analysis
Mastering Python Data Analysis
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for

Conventions
Reader feedback
Customer support
Downloading the example codeDownloading the color images of this bookErrataPiracyQuestions
1. Tools of the Trade
Before you start
Using the notebook interface
Imports
An example using the Pandas library
Summary
2. Exploring Data
The General Social SurveyObtaining the dataReading the data
Univariate data
HistogramsMaking things prettyCharacterizationConcept of statistical inferenceNumeric summaries and boxplots
Relationships between variables – scatterplots
Summary
3. Learning About Models
Models and experiments
The cumulative distribution function
Working with distributions
The probability density function
Where do models come from?
Multivariate distributions
Summary
4. Regression
Introducing linear regressionGetting the datasetTesting with linear regression
Multivariate regression
Adding economic indicatorsTaking a step back
Logistic regression
Some notes
Summary
5. Clustering
Introduction to cluster findingStarting out simple – John Snow on cholera
K-means clustering
Suicide rate versus GDP versus absolute latitude
Hierarchical clustering analysis
Reading in and reducing the dataHierarchical cluster algorithm
Summary
6. Bayesian Methods
The Bayesian methodCredible versus confidence intervalsBayes formulaPython packages
U.S. air travel safety record
Getting the NTSB databaseBinning the dataBayesian analysis of the dataBinning by monthPlotting coordinatesCartopyMpl toolkits – basemap
Climate change - CO2 in the atmosphere
Getting the dataCreating and sampling the model
Summary
7. Supervised and Unsupervised Learning
Introduction to machine learning
Scikit-learn
Linear regression
Climate dataChecking with Bayesian analysis and OLS
Clustering
Seeds classification
Visualizing the dataFeature selectionClassifying the dataThe SVC linear kernelThe SVC Radial Basis FunctionThe SVC polynomialK-Nearest NeighbourRandom ForestChoosing your classifier
Summary
8. Time Series Analysis
Introduction
Pandas and time series data
Indexing and slicing
Resampling, smoothing, and other estimates
Stationarity
Patterns and components
Decomposing componentsDifferencing
Time series models
Autoregressive – ARMoving average – MASelecting p and qAutomatic functionThe (Partial) AutoCorrelation FunctionAutoregressive Integrated Moving Average – ARIMA
Summary
A. More on Jupyter Notebook and matplotlib Styles
Jupyter NotebookUseful keyboard shortcutsCommand mode shortcutsEdit mode shortcutsMarkdown cellsNotebook Python extensionsInstalling the extensionsCodefoldingCollapsible headingsHelp panelInitialization cellsNbExtensions menu itemRulerSkip-tracebackTable of contentsOther Jupyter Notebook tipsExternal connectionsExportAdditional file types
Matplotlib styles
Useful resources
General resourcesPackagesData repositoriesVisualization of data
Summary

Overview

Mastering Python Data Analysis provides a comprehensive roadmap for Python developers to enhance their data analysis skills to tackle real-world problems. This book delves into advanced statistical analysis, covering tools, models, and methods to transform raw data into valuable insights.

What this Book will help me do

Effectively handle and preprocess data using Python and Pandas.
Explore statistical models to identify patterns and gain insights from data.
Learn clustering approaches to detect data groupings and predict outcomes.
Utilize Bayesian methods for quantifying causal relationships.
Generate professional reports and visualizations with Python tools like Jupyter Notebook.

Author(s)

None Vilhelm Persson is a seasoned software developer and data analyst with expertise in leveraging Python for sophisticated data analysis and machine learning tasks. Drawing from years of experience in the tech industry, None provides practical, real-world insights throughout the book. His approachable writing style ensures technical concepts are conveyed with clarity, making data analysis accessible to developers at varying skill levels.

Who is it for?

This book is ideal for intermediate Python developers seeking to elevate their data analysis skills. If you are familiar with Python libraries and have an interest in solving complex data problems, this guide will serve as a stepping stone to mastery. Advanced beginners with a curiosity for statistical methods and a desire to learn through practical examples will find this book invaluable. It is also perfect for professionals aiming to integrate Python-based statistical techniques into their workflow.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Python: End-to-end Data Analysis

Phuong Vothihong, Martin Czygan, Ivan Idris, Magnus Vilhelm Persson, Luiz Felipe Martins

Python: Data Analytics and Visualization

Phuong Vo.T.H, Martin Czygan, Ashish Kumar, Kirthi Raman

A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics

Gayathri Rajagopalan

Python Data Analysis Cookbook

Ivan Idris

Publisher Resources

ISBN: 9781783553297

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills