O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Science and Machine Learning with Python - Hands On!

Video Description

Perform data mining and Machine Learning efficiently using Python and Spark

About This Video

  • Take your first steps in the world of data science by understanding the tools and techniques of data analysis

  • Train efficient Machine Learning models in Python using the supervised and unsupervised learning methods

  • Learn how to use Apache Spark for processing Big Data efficiently

  • In Detail

    The job of a data scientist is one of the most lucrative jobs out there today – it involves analyzing large amounts of data, and gathering actionable business insights from it using a variety of tools. This course will help you take your first steps in the world of data science, and empower you to conduct data analysis and perform efficient machine learning using Python. Gain value from your data using the various data mining and data analysis techniques in Python, and develop efficient predictive models to predict future results. You will also learn how to perform large-scale machine learning on Big Data using Apache Spark. You don’t have to be an expert coder in Python to get the most out of this course – just a basic programming knowledge of Python is sufficient.

    Table of Contents

    1. Chapter 1 : Getting Started
      1. Introduction 00:02:45
      2. Getting What You Need 00:02:37
      3. Installing Enthought Canopy 00:06:51
      4. Python Basics – Part 1 00:15:58
      5. Python Basics – Part 2 00:09:41
      6. Running Python Scripts 00:03:55
      7. Introducing the Pandas Library 00:10:15
    2. Chapter 2 : Statistics and Probability Refresher, and Python Practise
      1. Types of Data 00:06:59
      2. Mean, Median, and Mode 00:05:26
      3. Using Mean, Median, and Mode in Python 00:08:30
      4. Variation and Standard Deviation 00:11:12
      5. Probability Density Function and Probability Mass Function 00:03:28
      6. Common Data Distributions 00:07:45
      7. Percentiles and Moments 00:12:33
      8. A Crash Course in matplotlib 00:13:46
      9. Covariance and Correlation 00:11:31
      10. Conditional Probability 00:10:16
      11. Exercise Solution – Conditional Probability of Purchase by Age 00:02:19
      12. Bayes' Theorem 00:05:23
    3. Chapter 3 : Predictive Models
      1. Linear Regression 00:11:01
      2. Polynomial Regression 00:08:05
      3. Multivariate Regression and Predicting Car Prices 00:09:53
      4. Multi-Level Models 00:04:37
    4. Chapter 4 : Machine Learning with Python
      1. Supervised versus Unsupervised Learning and Train/Test 00:08:57
      2. Using Train/Test to Prevent Overfitting of a Polynomial Regression 00:05:48
      3. Bayesian Methods – Concepts 00:04:00
      4. Implementing a Spam Classifier with Naive Bayes 00:08:06
      5. K-Means Clustering 00:07:24
      6. Clustering People Based on Income and Age 00:05:14
      7. Measuring Entropy 00:03:10
      8. Decision Trees – Concepts 00:08:43
      9. Decision Trees – Predicting Hiring Decisions 00:09:47
      10. Ensemble Learning 00:05:59
      11. Support Vector Machines (SVM) Overview 00:04:28
      12. Using SVM to Cluster People by using scikit-learn 00:05:36
    5. Chapter 5 : Recommender Systems
      1. User-Based Collaborative Filtering 00:07:57
      2. Item-Based Collaborative Filtering 00:08:16
      3. Finding Movie Similarities 00:09:08
      4. Improving the Results of Movie Similarities 00:08:00
      5. Making Movie Recommendations to People 00:10:22
      6. Improve the Recommender's Results 00:05:30
    6. Chapter 6 : More Data Mining and Machine Learning Techniques
      1. K-Nearest Neighbors – Concepts 00:03:45
      2. Using KNN to predict a rating for a movie 00:12:29
      3. Dimensionality Reduction and Principal Component Analysis 00:05:44
      4. A PCA Example with the Iris Dataset 00:09:05
      5. Data Warehousing Overview – ETL and ELT 00:09:05
      6. Reinforcement Learning 00:12:44
    7. Chapter 7 : Dealing with Real-World Data
      1. Bias/Variance Trade-off 00:06:16
      2. K-Fold Cross-Validation to Avoid Overfitting 00:10:55
      3. Data Cleaning and Normalization 00:07:10
      4. Cleaning Web Log Data 00:10:56
      5. Normalizing Numerical Data 00:03:23
      6. Detecting Outliers 00:07:00
    8. Chapter 8 : Apache Spark – Machine Learning on Big Data
      1. Installing Spark – Part 1 00:07:03
      2. Installing Spark – Part 2 00:13:29
      3. Spark Introduction 00:09:11
      4. Spark and the Resilient Distributed Dataset (RDD) 00:11:42
      5. Introducing MLLib 00:05:09
      6. Decision Trees in Spark 00:16:01
      7. K-Means Clustering in Spark 00:11:07
      8. TF/IDF 00:06:44
      9. Searching Wikipedia with Spark 00:08:12
      10. Using the Spark 2.0 DataFrame API for MLLib 00:07:57
    9. Chapter 9 : Experimental Design
      1. A/B Testing Concepts 00:08:23
      2. T-Tests and P-Values 00:06:00
      3. Hands On with T-Tests 00:06:04
      4. Determining How Long to Run an Experiment 00:03:25
      5. A/B Test Gotchas 00:09:27
    10. Chapter 10 : Final Project
      1. Your final project assignment 00:06:26
      2. Final Project Review 00:08:59
    11. Chapter 11 : You Made It!
      1. More to Explore 00:02:59
      2. Bonus Video: Discounts on my Spark and MapReduce courses! 00:01:06