Book Description
This book covers the fundamentals of machine learning with Python in a concise and dynamic manner. It covers data mining and largescale machine learning using Apache Spark.
About This Book
 Take your first steps in the world of data science by understanding the tools and techniques of data analysis
 Train efficient Machine Learning models in Python using the supervised and unsupervised learning methods
 Learn how to use Apache Spark for processing Big Data efficiently
Who This Book Is For
If you are a budding data scientist or a data analyst who wants to analyze and gain actionable insights from data using Python, this book is for you. Programmers with some experience in Python who want to enter the lucrative world of Data Science will also find this book to be very useful, but you don't need to be an expert Python coder or mathematician to get the most from this book.
What You Will Learn
 Learn how to clean your data and ready it for analysis
 Implement the popular clustering and regression methods in Python
 Train efficient machine learning models using decision trees and random forests
 Visualize the results of your analysis using Python’s Matplotlib library
 Use Apache Spark’s MLlib package to perform machine learning on large datasets
In Detail
Join Frank Kane, who worked on Amazon and IMDb’s machine learning algorithms, as he guides you on your first steps into the world of data science. HandsOn Data Science and Python Machine Learning gives you the tools that you need to understand and explore the core topics in the field, and the confidence and practice to build and analyze your own machine learning models. With the help of interesting and easytofollow practical examples, Frank Kane explains potentially complex topics such as Bayesian methods and Kmeans clustering in a way that anybody can understand them.
Based on Frank’s successful data science course, HandsOn Data Science and Python Machine Learning empowers you to conduct data analysis and perform efficient machine learning using Python. Let Frank help you unearth the value in your data using the various data mining and data analysis techniques available in Python, and to develop efficient predictive models to predict future results. You will also learn how to perform largescale machine learning on Big Data using Apache Spark. The book covers preparing your data for analysis, training machine learning models, and visualizing the final data analysis.
Style and approach
This comprehensive book is a perfect blend of theory and handson code examples in Python which can be used for your reference at any time.
Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.
Publisher Resources
Table of Contents
 Preface
 Getting Started
 Statistics and Probability Refresher, and Python Practice

Matplotlib and Advanced Probability Concepts

A crash course in Matplotlib
 Generating multiple plots on one graph
 Saving graphs as images
 Adjusting the axes
 Adding a grid
 Changing line types and colors
 Labeling axes and adding a legend
 A fun example
 Generating pie charts
 Generating bar charts
 Generating scatter plots
 Generating histograms
 Generating boxandwhisker plots
 Try it yourself
 Covariance and correlation
 Conditional probability
 Bayes' theorem
 Summary

A crash course in Matplotlib
 Predictive Models

Machine Learning with Python
 Machine learning and train/test
 Using train/test to prevent overfitting of a polynomial regression
 Bayesian methods  Concepts
 Implementing a spam classifier with Naïve Bayes
 KMeans clustering
 Clustering people based on income and age
 Measuring entropy
 Decision trees  Concepts
 Decision trees  Predicting hiring decisions using Python
 Ensemble learning
 Support vector machine overview
 Using SVM to cluster people by using scikitlearn
 Summary
 Recommender Systems
 More Data Mining and Machine Learning Techniques

Dealing with RealWorld Data
 Bias/variance tradeoff
 Kfold crossvalidation to avoid overfitting
 Data cleaning and normalisation
 Cleaning web log data
 Normalizing numerical data
 Detecting outliers
 Summary
 Apache Spark  Machine Learning on Big Data
 Testing and Experimental Design
Product Information
 Title: HandsOn Data Science and Python Machine Learning
 Author(s):
 Release date: July 2017
 Publisher(s): Packt Publishing
 ISBN: 9781787280748