O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Science Modeling Tutorial

Video Description

This video series will cover data science modeling concepts and methods. There are 12 clips in the series:

  • This first clip in the series introduces data science modeling and how modeling fits into the data science pipeline with data engineering and data distillation. Learn about variables and their types such as categorical, binary, and numeric. Understand features and feature sets.
  • This second clip in the series studies variables and features using techniques from statistics. Learn about variable scaling, linear and non-linear transformations, min-max normalization, and Z-normalization. We conclude with advice on normalizing a variable and recommend some useful Python resources.
  • This third clip in the series continues the discussion of featurization methods focusing on the binning process. The approach and benefits of binning are discussed, along with how to use the binning process results to create a histogram.
  • This fourth clip in the series covers machine learning and how it applies to data science. The most important methodologies in machine learning are discussed, including supervised, unsupervised, and semi-supervised. Machine learning is distinguished from statistics, data mining, and artificial intelligence.
  • This fifth clip in the series covers unsupervised learning, and clustering in particular. Learn why clustering is so powerful and its many use cases. K-Means is the most popular clustering algorithm. Common distance metrics such as Euclidean distance are explored.
  • This sixth clip in the series covers supervised learning and data model ensembles. Predictive analytics modeling is discussed along with classification and regression. Use cases are provided.
  • This seventh clip in the series covers how to assess the performance of a predictive analytics model. Sensitivity analysis and heuristics are also discussed.
  • This eighth clip in the series covers how to assess the performance of classifiers. Both the Confusion Matrix and ROC curve approaches will be discussed, as well as derived metrics and binary classifiers.
  • This ninth clip in the series covers both basic and advanced regression methods. Learn about standard evaluation metrics such as Sum Squared Error and Explained Variance.
  • This tenth clip in the series covers other important topics in data science modeling including K-fold cross validation (KFCV), model deployment, Application Programming Interfaces (APIs), and model maintenance.
  • This eleventh clip in the series shares resources you can use to further your data science knowledge, including specific books, videos, courses, and meetups.
  • This twelfth clip in the series covers next steps to take and projects to work on to apply the material in this course.