O'Reilly logo
live online training icon Live Online training

Inside unsupervised learning: Semisupervised learning using autoencoders

enter image description here

Explore automatic feature engineering using autoencoders and build semisupervised solutions

Ankur Patel

Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to general artificial intelligence. Conventional supervised learning cannot be applied to unlabeled data—which comprises the majority of the world's data. In these cases, unsupervised learning can help discover meaningful patterns buried deep in unlabeled datasets, patterns that otherwise would be near impossible for humans to uncover.

Join Ankur Patel for a deep dive into autoencoders, one of the core concepts of unsupervised learning, and an introduction to semisupervised learning. Autoencoders are shallow neural networks that learn representations of the original input data and output the newly learned representations. In other words, autoencoders perform automatic feature engineering, limiting the need for manual feature engineering and accelerating the build of machine learning systems. Autoencoders are also a means to leverage information in a partially labeled dataset. With autoencoders, you can turn unsupervised machine learning problems into semisupervised ones.

In just 90 minutes, you'll learn how to build a credit card fraud detection system in three ways: with unsupervised learning (without using any labels); with supervised learning (using a partially labeled dataset); and with semisupervised learning (applying autoencoders to the partially labeled dataset—an unsupervised technique—and combining it with a supervised approach). Then you'll compare the results to determine the strengths and weaknesses of each method.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • How to work with TensorFlow and Keras
  • Why neural networks are so powerful
  • How to learn representations using autoencoders
  • How to turn unsupervised learning problems to semisupervised ones

And you’ll be able to:

  • Build shallow neural networks (e.g., autoencoders)
  • Apply autoencoders to a partially labeled dataset and feed newly learned representations into a supervised model, developing a semisupervised solution

This training course is for you because...

  • You're a data scientist or engineer who wants to work with unlabeled data.
  • You want to perform semisupervised learning to solve a business use case.

Prerequisites

  • A working knowledge of Python
  • A basic understanding of machine learning

Recommended preparation:

Recommended follow-up:

About your instructor

  • Ankur A. Patel is the Vice President of Data Science at 7Park Data, a Vista Equity Partners portfolio company. At 7Park Data, Ankur and his data science team use alternative data to build data products for hedge funds and corporations and develop machine learning as a service (MLaaS) for enterprise clients. MLaaS includes natural language processing (NLP), anomaly detection, clustering, and time series prediction. Prior to 7Park Data, Ankur led data science efforts in New York City for Israeli artificial intelligence firm ThetaRay, one of the world's pioneers in applied unsupervised learning.

    Ankur began his career as an analyst at J.P. Morgan, and then became the lead emerging markets sovereign credit trader for Bridgewater Associates, the world's largest global macro hedge fund, and later founded and managed R-Squared Macro, a machine learning-based hedge fund, for five years. A graduate of the Woodrow Wilson School at Princeton University, Ankur is the recipient of the Lieutenant John A. Larkin Memorial Prize.

    He currently resides in Tribeca in New York City but travels extensively internationally.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to unsupervised learning (10 minutes)

  • Lecture and hands-on exercises: How unsupervised learning fits into the machine learning ecosystem; common problems in machine learning—finding patterns without using labels and leveraging partially labeled datasets to build good machine learning solutions

Motivation for representation learning (10 minutes)

- Lecture and hands-on exercises: Why neural networks are powerful; why automatic feature engineering is important; how representation learning improves machine learning performance

Motivation for semisupervised learning (10 minutes)

  • Lecture and hands-on exercises: How supervised and unsupervised learning complement each other; how to capture information embedded in a partially labeled dataset and use the embedded information to improve machine learning performance
  • Q&A (5 minutes)
  • Break (5 minutes)

Data preparation (10 minutes)

  • Lecture and hands-on exercises: Explore data in a Jupyter notebook; prepare the credit card dataset

Autoencoders (15 minutes)

  • Lecture and hands-on exercises: Introduction to autoencoders; train autoencoders

Semisupervised learning (15 minutes)

  • Lecture and hands-on exercises: Build unsupervised learning fraud detection system; build supervised learning fraud detection system; build semisupervised learning fraud detection system; compare and contrast results

Wrap-up and Q&A (10 minutes)