Table of Contents

 Preface
 1. Getting Started with Raw Data
 2. Inferential Statistics

3. Finding a Needle in a Haystack
 What is data mining?
 Presenting an analysis

Studying the Titanic
 Which passenger class has the maximum number of survivors?
 What is the distribution of survivors based on gender among the various classes?
 What is the distribution of nonsurvivors among the various classes who have family aboard the ship?
 What was the survival percentage among different age groups?
 Summary
 4. Making Sense of Data through Advanced Visualization
 5. Uncovering Machine Learning
 6. Performing Predictions with a Linear Regression
 7. Estimating the Likelihood of Events
 8. Generating Recommendations with Collaborative Filtering

9. Pushing Boundaries with Ensemble Models

The census income dataset

Exploring the census data
 Hypothesis 1: People who are older earn more
 Hypothesis 2: Income bias based on working class
 Hypothesis 3: People with more education earn more
 Hypothesis 4: Married people tend to earn more
 Hypothesis 5: There is a bias in income based on race
 Hypothesis 6: There is a bias in the income based on occupation
 Hypothesis 7: Men earn more
 Hypothesis 8: People who clock in more hours earn more
 Hypothesis 9: There is a bias in income based on the country of origin

Exploring the census data
 Decision trees
 Random forests
 Summary

The census income dataset
 10. Applying Segmentation with kmeans Clustering
 11. Analyzing Unstructured Data with Text Mining
 12. Leveraging Python in the World of Big Data
 Index
