O'Reilly logo
live online training icon Live Online training

Machine Learning from Scratch

Thomas Nield

Machine learning is becoming more accessible thanks to libraries like Scikit-learn and Tensorflow. As a matter of fact, it is becoming so accessible that few practitioners actually take time to understand what happens under the hood. As machine learning becomes increasingly commoditized, those interested in machine learning should take time to understand these algorithms intimately to stay competitive. The risks of trusting black boxes, from lack of insight to complete misuse, will often outweigh the benefits of convenience.

In this course, we will take a highly practical approach to building machine learning algorithms from scratch with Python including linear regression, logistic regression, Naïve Bayes, decision trees, and neural networks. This will give you a better understanding on how machine learning works, and allow you to use libraries (or build them from scratch) more confidently. We will learn some simple but powerful optimization tools to generalize solutions quickly, while avoiding distracting concepts like Calculus, partial derivatives, and linear algebra.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • The fundamental concepts behind different machine learning algorithms, as well as regression and classification tasks
  • The challenges and strengths of each machine learning model
  • What makes “machine learning” tick, and different ways to perform regression and classification

And you’ll be able to:

  • Build linear regression, logistic regression, Naïve Bayes, decision trees, and neural network models completely from scratch
  • Leverage hill climbing to optimize machine learning parameters easily and without calculus
  • Develop intuition on how machine learning libraries work

This training course is for you because...

  • You’re a data science professional wanting to interpret machine learning beyond a “black box” understanding
  • You’re a programmer who wants to see what machine learning is all about, and how to do it from scratch
  • You’re someone not intimidated by some code and basic math, and want to see how these two areas can be combined to do regression and classification tasks.

Prerequisites

  • Comfort and proficiency with Python, including variables, functions, loops, generators, and classes.
  • Basic knowledge of NumPy and/or Pandas is recommended, but not required.

Recommended preparation:

Recommended follow-up:

Attend Intro to Mathematical Optimization (live online training course with Thomas Nield)

About your instructor

  • Thomas Nield (author of Getting Started with SQL) has a business analyst background and works at Southwest Airlines in Revenue Management. Early in his career he became fascinated with technology and bought dozens of books to master programming in Java, C#, Kotlin, and database design. He is passionate about sharing what he learns and enabling others with new skillsets, even if they do not work in IT. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it.

    Thomas has developed several database-driven applications for Southwest Airlines that generate revenue for the entire airline network. He believes technology should conform to the business, and emphasizes usefulness and real-world practicality while balancing the perspectives of IT and business professionals.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Getting Started (10 minutes)

  • Presentation: Overview and Expectations
  • Demo: Using Hill Climbing to find square root
  • Discussion: The importance of optimization
  • Exercise: Hill Climbing to find cubed root
  • Q&A

Linear Regression and K-Means Clustering (50 minutes)

  • Presentation: Fundamentals, minimizing the sum/mean of squares
  • Walkthrough: Simple linear regression
  • Walkthrough: Multivariable linear regression
  • Walkthrough: K-Means clustering
  • Exercise: Linear regression
  • Q&A
  • Break (5 minutes)

Logistic Regression (40 minutes)

  • Presentation: Logistic regression concepts
  • Walkthrough: Simple logistic regression
  • Walkthrough: Multivariable logistic regression
  • Walkthough: Logistic regression to categorize text
  • Exercise: Testing the model
  • Q&A

Naïve Bayes (35 minutes)

  • Walkthrough: Categorizing text demo
  • Presentation: How to implement naïve bayes
  • Walkthrough/Exercise: Building an email spam classifier
  • Exercise: Testing the model
  • Q&A
  • Break (5 minutes)

Decision Trees (40 minutes)

  • Presentation: Decision tree fundamentals
  • Walkthrough: Building a decision tree
  • Exercise: GINI scoring, testing the model
  • Q&A
  • Break (5 minutes)

Neural Networks (50 minutes)

  • Presentation: Neural network fundamentals
  • Walkthrough: Building a neural network to classify colors
  • Exercise: Testing the model
  • Q&A