Machine Learning from Scratch
Machine learning is becoming more accessible thanks to libraries like Scikitlearn and Tensorflow. As a matter of fact, it is becoming so accessible that few practitioners actually take time to understand what happens under the hood. As machine learning becomes increasingly commoditized, those interested in machine learning should take time to understand these algorithms intimately to stay competitive. The risks of trusting black boxes, from lack of insight to complete misuse, will often outweigh the benefits of convenience.
In this course, we will take a highly practical approach to building machine learning algorithms from scratch with Python including linear regression, logistic regression, Naïve Bayes, decision trees, and neural networks. This will give you a better understanding on how machine learning works, and allow you to use libraries (or build them from scratch) more confidently. We will learn some simple but powerful optimization tools to generalize solutions quickly, while avoiding distracting concepts like Calculus, partial derivatives, and linear algebra.
What you'll learnand how you can apply it
By the end of this live, handson, online course, you’ll understand:
 The fundamental concepts behind different machine learning algorithms, as well as regression and classification tasks
 The challenges and strengths of each machine learning model
 What makes “machine learning” tick, and different ways to perform regression and classification
And you’ll be able to:
 Build linear regression, logistic regression, Naïve Bayes, decision trees, and neural network models completely from scratch
 Leverage hill climbing to optimize machine learning parameters easily and without calculus
 Develop intuition on how machine learning libraries work
This training course is for you because...
 You’re a data science professional wanting to interpret machine learning beyond a “black box” understanding
 You’re a programmer who wants to see what machine learning is all about, and how to do it from scratch
 You’re someone not intimidated by some code and basic math, and want to see how these two areas can be combined to do regression and classification tasks.
Prerequisites
 Comfort and proficiency with Python, including variables, functions, loops, generators, and classes.
 Basic knowledge of NumPy and/or Pandas is recommended, but not required.
Recommended preparation:
 Set up a Python environment of your choice. The instructor will be using PyCharm with Python 3.7.
 GitHub files: https://github.com/thomasnield/oreilly_machine_learning_from_scratch/
 If you are new to NumPy or Pandas, consider reviewing chapters 4 “NumPy Basics: Arrays and Vectorized Computation” and 5 “Getting Started with pandas” in Python for Data Analysis, 2nd Edition (book).
Recommended followup:
Attend Intro to Mathematical Optimization (live online training course with Thomas Nield)
About your instructor

Thomas Nield (author of Getting Started with SQL) has a business analyst background and works at Southwest Airlines in Revenue Management. Early in his career he became fascinated with technology and bought dozens of books to master programming in Java, C#, Kotlin, and database design. He is passionate about sharing what he learns and enabling others with new skillsets, even if they do not work in IT. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it.
Thomas has developed several databasedriven applications for Southwest Airlines that generate revenue for the entire airline network. He believes technology should conform to the business, and emphasizes usefulness and realworld practicality while balancing the perspectives of IT and business professionals.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Getting Started (10 minutes)
 Presentation: Overview and Expectations
 Demo: Using Hill Climbing to find square root
 Discussion: The importance of optimization
 Exercise: Hill Climbing to find cubed root
 Q&A
Linear Regression and KMeans Clustering (50 minutes)
 Presentation: Fundamentals, minimizing the sum/mean of squares
 Walkthrough: Simple linear regression
 Walkthrough: Multivariable linear regression
 Walkthrough: KMeans clustering
 Exercise: Linear regression
 Q&A
 Break (5 minutes)
Logistic Regression (40 minutes)
 Presentation: Logistic regression concepts
 Walkthrough: Simple logistic regression
 Walkthrough: Multivariable logistic regression
 Walkthough: Logistic regression to categorize text
 Exercise: Testing the model
 Q&A
Naïve Bayes (35 minutes)
 Walkthrough: Categorizing text demo
 Presentation: How to implement naïve bayes
 Walkthrough/Exercise: Building an email spam classifier
 Exercise: Testing the model
 Q&A
 Break (5 minutes)
Decision Trees (40 minutes)
 Presentation: Decision tree fundamentals
 Walkthrough: Building a decision tree
 Exercise: GINI scoring, testing the model
 Q&A
 Break (5 minutes)
Neural Networks (50 minutes)
 Presentation: Neural network fundamentals
 Walkthrough: Building a neural network to classify colors
 Exercise: Testing the model
 Q&A