Book Description
Explore the world of data science through Python and learn how to make sense of data
About This Book
 Master data science methods using Python and its libraries
 Create data visualizations and mine for patterns
 Advanced techniques for the four fundamentals of Data Science with Python  data mining, data analysis, data visualization, and machine learning
Who This Book Is For
If you are a Python developer who wants to master the world of data science then this book is for you. Some knowledge of data science is assumed.
What You Will Learn
 Manage data and perform linear algebra in Python
 Derive inferences from the analysis by performing inferential statistics
 Solve data science problems in Python
 Create highend visualizations using Python
 Evaluate and apply the linear regression technique to estimate the relationships among variables.
 Build recommendation engines with the various collaborative filtering algorithms
 Apply the ensemble methods to improve your predictions
 Work with big data technologies to handle data at scale
In Detail
Data science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a mustknow tool for every aspiring data scientist. Using Python will offer you a fast, reliable, crossplatform, and mature environment for data analysis, machine learning, and algorithmic problem solving.
This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a handson, advanced study of data science.
Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create highend visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods.
Finally, you will perform Kmeans clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics.
Style and approach
This book is an easytofollow, comprehensive guide on data science using Python. The topics covered in the book can all be used in real world scenarios.
Publisher Resources
Table of Contents

Mastering Python for Data Science
 Table of Contents
 Mastering Python for Data Science
 Credits
 About the Author
 About the Reviewers
 www.PacktPub.com
 Preface
 1. Getting Started with Raw Data
 2. Inferential Statistics

3. Finding a Needle in a Haystack
 What is data mining?
 Presenting an analysis

Studying the Titanic
 Which passenger class has the maximum number of survivors?
 What is the distribution of survivors based on gender among the various classes?
 What is the distribution of nonsurvivors among the various classes who have family aboard the ship?
 What was the survival percentage among different age groups?
 Summary
 4. Making Sense of Data through Advanced Visualization
 5. Uncovering Machine Learning
 6. Performing Predictions with a Linear Regression
 7. Estimating the Likelihood of Events
 8. Generating Recommendations with Collaborative Filtering

9. Pushing Boundaries with Ensemble Models

The census income dataset

Exploring the census data
 Hypothesis 1: People who are older earn more
 Hypothesis 2: Income bias based on working class
 Hypothesis 3: People with more education earn more
 Hypothesis 4: Married people tend to earn more
 Hypothesis 5: There is a bias in income based on race
 Hypothesis 6: There is a bias in the income based on occupation
 Hypothesis 7: Men earn more
 Hypothesis 8: People who clock in more hours earn more
 Hypothesis 9: There is a bias in income based on the country of origin

Exploring the census data
 Decision trees
 Random forests
 Summary

The census income dataset
 10. Applying Segmentation with kmeans Clustering
 11. Analyzing Unstructured Data with Text Mining
 12. Leveraging Python in the World of Big Data
 Index
Product Information
 Title: Mastering Python for Data Science
 Author(s):
 Release date: August 2015
 Publisher(s): Packt Publishing
 ISBN: 9781784390150