Book description
Understand, explore, and effectively present data using the powerful data visualization techniques of Python
Key Features
 Use the power of Pandas and Matplotlib to easily solve data mining issues
 Understand the basics of statistics to build powerful predictive data models
 Grasp data mining concepts with helpful usecases and examples
Book Description
Data mining, or parsing the data to extract useful insights, is a niche skill that can transform your career as a data scientist Python is a flexible programming language that is equipped with a strong suite of libraries and toolkits, and gives you the perfect platform to sift through your data and mine the insights you seek. This Learning Path is designed to familiarize you with the Python libraries and the underlying statistics that you need to get comfortable with data mining.
You will learn how to use Pandas, Python's popular library to analyze different kinds of data, and leverage the power of Matplotlib to generate appealing and impressive visualizations for the insights you have derived. You will also explore different machine learning techniques and statistics that enable you to build powerful predictive models.
By the end of this Learning Path, you will have the perfect foundation to take your data mining skills to the next level and set yourself on the path to become a soughtafter data science professional.
This Learning Path includes content from the following Packt products:
 Statistics for Machine Learning by Pratap Dangeti
 Matplotlib 2.x By Example by Allen Yu, Claire Chung, Aldrin Yim
 Pandas Cookbook by Theodore Petrou
What you will learn
 Understand the statistical fundamentals to build data models
 Split data into independent groups
 Apply aggregations and transformations to each group
 Create impressive data visualizations
 Prepare your data and design models
 Clean up data to ease data analysis and visualization
 Create insightful visualizations with Matplotlib and Seaborn
 Customize the model to suit your own predictive goals
Who this book is for
If you want to learn how to use the many libraries of Python to extract impactful information from your data and present it as engaging visuals, then this is the ideal Learning Path for you. Some basic knowledge of Python is enough to get started with this Learning Path.
Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files emailed directly to you.
Publisher resources
Table of contents
 Title Page
 Copyright
 Contributors
 About Packt
 Preface
 Journey from Statistics to Machine Learning

TreeBased Machine Learning Models
 Introducing decision tree classifiers
 Comparison between logistic regression and decision trees
 Comparison of error components across various styles of models
 Remedial actions to push the model towards the ideal region
 HR attrition data example
 Decision tree classifier
 Tuning class weights in decision tree classifier
 Bagging classifier
 Random forest classifier
 Random forest classifier  grid search
 AdaBoost classifier
 Gradient boosting classifier
 Comparison between AdaBoosting versus gradient boosting
 Extreme gradient boosting  XGBoost classifier
 Ensemble of ensembles  model stacking
 Ensemble of ensembles with different types of classifiers
 Ensemble of ensembles with bootstrap samples using a single type of classifier
 Summary

KNearest Neighbors and Naive Bayes
 Knearest neighbors
 KNN classifier with breast cancer Wisconsin data example
 Tuning of kvalue in KNN classifier
 Naive Bayes
 Probability fundamentals
 Understanding Bayes theorem with conditional probability
 Naive Bayes classification
 Laplace estimator
 Naive Bayes SMS spam classification example
 Summary
 Unsupervised Learning

Reinforcement Learning
 Reinforcement learning basics
 Markov decision processes and Bellman equations
 Dynamic programming
 Grid world example using value and policy iteration algorithms with basic Python
 Monte Carlo methods
 Temporal difference learning
 SARSA onpolicy TD control
 Qlearning  offpolicy TD control
 Cliff walking example of onpolicy and offpolicy of TD control
 Further reading
 Summary

Hello Plotting World!
 Hello Matplotlib!
 Plotting our first graph
 Summary
 Visualizing Online Data
 Visualizing Multivariate Data
 Adding Interactivity and Animating Plots
 Selecting Subsets of Data

Boolean Indexing
 Calculating boolean statistics
 Constructing multiple boolean conditions
 Filtering with boolean indexing
 Replicating boolean indexing with index selection
 Selecting with unique and sorted indexes
 Gaining perspective on stock prices
 Translating SQL WHERE clauses
 Determining the normality of stock market returns
 Improving readability of boolean indexing with the query method
 Preserving Series with the where method
 Masking DataFrame rows
 Selecting with booleans, integer location, and labels
 Index Alignment

Grouping for Aggregation, Filtration, and Transformation
 Defining an aggregation
 Grouping and aggregating with multiple columns and functions
 Removing the MultiIndex after grouping
 Customizing an aggregation function
 Customizing aggregating functions with *args and **kwargs
 Examining the groupby object
 Filtering for states with a minority majority
 Transforming through a weight loss bet
 Calculating weighted mean SAT scores per state with apply
 Grouping by continuous variables
 Counting the total number of flights between cities
 Finding the longest streak of ontime flights

Restructuring Data into a Tidy Form
 Tidying variable values as column names with stack
 Tidying variable values as column names with melt
 Stacking multiple groups of variables simultaneously
 Inverting stacked data
 Unstacking after a groupby aggregation
 Replicating pivot_table with a groupby aggregation
 Renaming axis levels for easy reshaping
 Tidying when multiple variables are stored as column names
 Tidying when multiple variables are stored as column values
 Tidying when two or more values are stored in the same cell
 Tidying when variables are stored in column names and values
 Tidying when multiple observational units are stored in the same table
 Combining Pandas Objects
 Other Books You May Enjoy
Product information
 Title: Numerical Computing with Python
 Author(s):
 Release date: December 2018
 Publisher(s): Packt Publishing
 ISBN: 9781789953633
You might also like
book
Mastering Numerical Computing with NumPy
Enhance the power of NumPy and start boosting your scientific computing capabilities About This Book Grasp …
book
Introduction to Computational Models with Python
Introduction to Computational Models with Python explains how to implement computational models using the flexible and …
book
Classic Computer Science Problems in Python
Classic Computer Science Problems in Python deepens your knowledge of problem solving techniques from the realm …
book
HandsOn Explainable AI (XAI) with Python
Resolve the black box models in your AI applications to make them fair, trustworthy, and secure. …