Advanced Machine Learning with scikit-learn
Robust models, textual analysis, and deeper feature analysis
The algorithmic extraction of knowledge from data—known as machine learning—is the main goal of data analysis and prediction, for business and science. The ability to perform complex analysis of data, moving beyond the basic tools of statistics, has been refined and developed increasingly over the last two decades. Over a similar period, Python has grown to be the premier language for data science, and scikit-learn has grown to be the main toolkit used within Python for general purpose machine learning.
What you'll learn-and how you can apply it
- Feature extraction techniques for textual data
- Feature selection classes (applicable to many types of problems)
- Robust train/test splits and cross-validation
- Value imputes
- Specialized and custom metrics for model evaluation
This training course is for you because...
Aspiring or beginning data scientists. Students in this course should have a comfortable intermediate level knowledge of Python and a very basic familiarity with statistics and linear algebra. College students or working programmers who are motivated to expand their skills to include machine learning with Python are a perfect fit.
- A first course in Python and/or working experience as a programmer
- College level basic mathematics
- Students should have a system with Jupyter notebooks installed, a recent version of scikit-learn, along with Pandas, NumPy, and matplotlib, and the general scientific Python tool stack. The training materials will be made available as notebooks at a GitHub repository.
This advanced class is intended for those who have attended or are already familiar with the topics covered in the following classes (search O’Reilly Learning for dates): - Beginning Machine Learning with scikit-learn (Live Online Training) by David Mertz - Intermediate Machine Learning with scikit-learn (Live Online Training) by David Mertz
- Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, By Aurélien Géron
- Introduction to Machine Learning with Python, by Sarah Guido, Andreas C. Müller
- Machine Learning with scikit-learn LiveLessons (Video Training) by David Mertz
About your instructor
David Mertz is a data scientist, trainer, and erstwhile startup CTO, who is currently writing the Addison Wesley title Cleaning Data for Successful Data Science: Doing the other 80% of the work. He created the training program for Anaconda, Inc. He was a Director of the Python Software Foundation for six years and remains chair of a few PSF committees. For nine years, David helped with creating the world's fastest—highly-specialized—supercomputer for performing molecular dynamics.
The timeframes are only estimates and may vary according to how the class is progressing
Feature extraction techniques for textual data (45 min)
Feature selection classes (45 min) - Break (5 min)
Robust train/test splits and cross-validation (45 min)
Value imputation (45 min) - Break (5 min)
Specialized and custom metrics for model evaluation (45 min)
Class wrap-up (5 min)