CHAPTER 3Predictive Model Building: Balancing Performance, Complexity, and Big Data

This chapter discusses the factors affecting the performance of machine learning models. The chapter provides technical definitions of performance for different types of machine learning problems. In an ecommerce application, for example, good performance might mean returning correct search results or presenting ads that site visitors frequently click. In a genetic problem, it might mean isolating a few genes responsible for a heritable condition. The chapter describes relevant performance measures for these different problems.

The goal of selecting and fitting a predictive algorithm is to achieve the best possible performance. Achieving performance goals involves three factors: complexity of the problem, complexity of the predictive model employed, and the amount and richness of the data available.

In this chapter you will learn that achieving high performance on a complicated problem requires a complicated model, but that a complicated model requires a large, rich data set for adequate training. When your problem is less complicated or not much data is available, then a less complicated model will be the best choice. This process involves two things. First, you need models whose complexity is easily adjustable and second, you need a way to measure their performance. This chapter will discuss both of those things for some particular examples, with a particular focus on performance measurements. ...

Get Machine Learning with Spark and Python, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.