Test-Driven Machine Learning

Book description

Control your machine learning algorithms using test-driven development to achieve quantifiable milestones

About This Book

  • Build smart extensions to pre-existing features at work that can help maximize their value
  • Quantify your models to drive real improvement
  • Take your knowledge of basic concepts, such as linear regression and Naïve Bayes classification, to the next level and productionalize their models
  • Play what-if games with your models and techniques by following the test-driven exploration process

Who This Book Is For

This book is intended for data technologists (scientists, analysts, or developers) with previous machine learning experience who are also comfortable reading code in Python. You may be starting, or have already started, a machine learning project at work and are looking for a way to deliver results quickly to enable rapid iteration and improvement. Those looking for examples of how to isolate issues in models and improve them will find ideas in this book to move forward.

What You Will Learn

  • Get started with an introduction to test-driven development and familiarize yourself with how to apply these concepts to machine learning
  • Build and test a neural network deterministically, and learn to look for niche cases that cause odd model behaviour
  • Learn to use the multi-armed bandit algorithm to make optimal choices in the face of an enormous amount of uncertainty
  • Generate complex and simple random data to create a wide variety of test cases that can be codified into tests
  • Develop models iteratively, even when using a third-party library
  • Quantify model quality to enable collaboration and rapid iteration
  • Adopt simpler approaches to common machine learning algorithms
  • Take behaviour-driven development principles to articulate test intent

In Detail

Machine learning is the process of teaching machines to remember data patterns, using them to predict future outcomes, and offering choices that would appeal to individuals based on their past preferences.

Machine learning is applicable to a lot of what you do every day. As a result, you can't take forever to deliver your first iteration of software. Learning to build machine learning algorithms within a controlled test framework will speed up your time to deliver, quantify quality expectations with your clients, and enable rapid iteration and collaboration.

This book will show you how to quantifiably test machine learning algorithms. The very different, foundational approach of this book starts every example algorithm with the simplest thing that could possibly work. With this approach, seasoned veterans will find simpler approaches to beginning a machine learning algorithm. You will learn how to iterate on these algorithms to enable rapid delivery and improve performance expectations.

The book begins with an introduction to test driving machine learning and quantifying model quality. From there, you will test a neural network, predict values with regression, and build upon regression techniques with logistic regression. You will discover how to test different approaches to naïve bayes and compare them quantitatively, along with how to apply OOP (Object-Oriented Programming) and OOP patterns to test-driven code, leveraging SciKit-Learn.

Finally, you will walk through the development of an algorithm which maximizes the expected value of profit for a marketing campaign by combining one of the classifiers covered with the multiple regression example in the book.

Style and approach

An example-driven guide that builds a deeper knowledge and understanding of iterative machine learning development, test by test. Each topic develops solutions using failing tests to illustrate problems; these are followed by steps to pass the tests, simply and straightforwardly. Topics which use generated data explore how the data was generated, alongside explanations of the assumptions behind different machine learning techniques.

Table of contents

  1. Test-Driven Machine Learning
    1. Table of Contents
    2. Test-Driven Machine Learning
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. Introducing Test-Driven Machine Learning
      1. Test-driven development
      2. The TDD cycle
        1. Red
        2. Green
        3. Refactor
      3. Behavior-driven development
      4. Our first test
        1. The anatomy of a test
          1. Given
          2. When
          3. Then
      5. TDD applied to machine learning
      6. Dealing with randomness
      7. Different approaches to validating the improved models
        1. Classification overview
        2. Regression
        3. Clustering
      8. Quantifying the classification models
      9. Summary
    9. 2. Perceptively Testing a Perceptron
      1. Getting started
      2. Summary
    10. 3. Exploring the Unknown with Multi-armed Bandits
      1. Understanding a bandit
      2. Testing with simulation
      3. Starting from scratch
      4. Simulating real world situations
      5. A randomized probability matching algorithm
      6. A bootstrapping bandit
      7. The problem with straight bootstrapping
      8. Multi-armed armed bandit throw down
      9. Summary
    11. 4. Predicting Values with Regression
      1. Refresher on advanced regression
        1. Regression assumptions
        2. Quantifying model quality
      2. Generating our own data
      3. Building the foundations of our model
      4. Cross-validating our model
      5. Generating data
      6. Summary
    12. 5. Making Decisions Black and White with Logistic Regression
      1. Generating logistic data
      2. Measuring model accuracy
      3. Generating a more complex example
      4. Test driving our model
      5. Summary
    13. 6. You're So Naïve, Bayes
      1. Gaussian classification by hand
      2. Beginning the development
      3. Summary
    14. 7. Optimizing by Choosing a New Algorithm
      1. Upgrading the classifier
      2. Applying our classifier
      3. Upgrading to Random Forest
      4. Summary
    15. 8. Exploring scikit-learn Test First
      1. Test-driven design
      2. Planning our journey
        1. Creating a classifier chooser (it needs to run tests to evaluate classifier performance)
      3. Getting choosey
      4. Developing testable documentation
        1. Decision trees
      5. Summary
    16. 9. Bringing It All Together
      1. Starting at the highest level
      2. The real world
      3. What we've accomplished
      4. Summary
    17. Index

Product information

  • Title: Test-Driven Machine Learning
  • Author(s): Justin Bozonier
  • Release date: November 2015
  • Publisher(s): Packt Publishing
  • ISBN: 9781784399085