O'Reilly logo
live online training icon Live Online training

Practical Machine Learning: Ensemble and deep learning with unstructured data

enter image description here

Topic: Data
Matt Kirk

In this workshop, we will talk about the specifics of feature learning using neural nets, ensemble learning, and data transforms that are valuable in practice. The workshop is a continuation of your study of machine learning and assumes a basic knowledge of supervised learning.

Machine learning is becoming required for many software developers and analysts. This workshop will walk you through everyday situations, such as dealing with too many dimensions, diagnosing errors, and learning structure from unstructured data. We will focus more on the mindset and mental model behind the practice rather than the academic grounding (although we won't prevent you from reading further after the class!).

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • What a feature is, and how to go about learning them.
  • What an ensemble is and why they are so powerful.
  • How to diagnose common issues in practice with machine learning, like bias, overtraining, undertraining, and dimensional issues.
  • What deep learning’s primary advantage is over traditional machine learning.

And you’ll be able to:

  • Transform raw data into useful features
  • Recognize gender of voice using an ensemble classifier
  • Learn a word (or any unstructured element) embedding and why that’s valuable

This training course is for you because...

  • You’re a data analyst or developer looking to progress your career.
  • You work with data scientists and want to understand their process.
  • You want to become a data scientist or machine learning engineer.

About your instructor

  • Matt Kirk is a data architect, software engineer, and entrepreneur based out of Seattle, WA.

    For years, he struggled to piece together his quantitative finance background with his passion for building software.

    Then he discovered his affinity for solving problems with data.

    Now, he helps multi-million dollar companies with their data projects. From diamond recommendation engines to marketing automation tools, he loves educating engineering teams about methods to start their big data projects.

    To learn more about how you can get started with your big data project (beyond taking this class), check out matthewkirk.com for tips.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (10 minutes)

  • "When linear regression, use something else." This class is something else.
  • Why are you here?
  • About Matt

Feature engineering, ensemble learning, and what happens when things go wrong (50 minutes)

  • Feature engineering in a nutshell: polish on the existing data
  • The effect of feature engineering as it relates to the curse of dimensionality
  • Ensemble Learning in a nutshell: as it relates to weather reports
  • What happens when models don't work?
  • Matrix factorization
  • Ensemble Learning
  • Bagging
  • Boosting
  • Deep Learning
  • Flowchart for tuning models to their best capacity
  • Deep Learning and why it's so valuable
  • White, Grey, and Black box testing
  • Black Box Testing: sklearn.metrics
  • Grey Box Testing: LIME, Shap
  • White Box Testing: Linear Regression and Traditional Stats
  • Quiz
  • Discussion and Q&A
  • Reflection: Heisenberg Principle as it relates to Machine Learning.
  • Break 5 Minutes

Matrix Factorization and Ensemble Learning (60 minutes)

  • Presentation with Katacoda scenarios to play along with:
  • Predicting the weather and the butterfly effect.
  • "The first step is simple stupid." What is the absolute simplest solution?
  • When that doesn't work Matrix Factorization
  • Feature Selection
  • Feature Transforms
  • PCA
  • tSNE
  • UMAP
  • Bagging
  • Boosting
  • AdaBoost
  • CatBoost
  • XgBoost
  • Quiz
  • Demonstration: Watch over Matt's shoulder and tear apart a problem of detecting gender in voice data.
  • Exercise: Given a primary classifier of gender and voice, can you do better going through the flowchart presented?
  • Q&A
  • Break 5 minutes

Very Shallow intro to Deep Learning (55 minutes)

  • Presentation with Katacoda to play along with:
  • Supervised Learning, Unsupervised, Reinforcement Learning… Deep??
  • What exactly is deep learning? The focus on Yoshua Bengio's definition of feature learning.
  • Working memory and chunking and how that relates to deep learning
  • The basics of neural networks. Very shallow introduction to a deep subject.
  • Representation Learning and its use
  • Word Embedding
  • Image Features
  • Use of learned Representations:
  • Fine-tuning
  • Using representations as input data
  • The cost of deep learning, why use it, to begin with?
  • Quiz
  • Demonstration: Watch over Matt's shoulder as we look at word embeddings and how to determine sentiment. The old school approach and the modern system using word2vec or similar methods.
  • Exercise: Breakout to classify sentiment using words using deep learning techniques like a bidirectional LSTM or word2vec classification. Again choose your adventure flowchart.
  • Q&A

Wrap up and conclusion (5 minutes)

  • Presentation:
  • What machine learning is and is not.
  • Predicting gender from voice: what did we learn?
  • Predicting sentiment: what did we learn?
  • Wrap up of process for hardening and improving machine learning models through ensembles and deep learning techniques.
  • Discussion: What was your number? And how are you going to bring this back to your job?
  • Q&A