CHAPTER 3Predictive Model Building: Balancing Performance, Complexity, and Big Data
This chapter discusses the factors affecting the performance of machine learning models. The chapter provides technical definitions of performance for different types of machine learning problems. In an ecommerce application, for example, good performance might mean returning correct search results or presenting ads that site visitors frequently click. In a genetic problem, it might mean isolating a few genes responsible for a heritable condition. The chapter describes relevant performance measures for these different problems.
The goal of selecting and fitting a predictive algorithm is to achieve the best possible performance. Achieving performance goals involves three factors: complexity of the problem, complexity of the predictive model employed, and the amount and richness of the data available.
In this chapter you will learn that achieving high performance on a complicated problem requires a complicated model, but that a complicated model requires a large, rich data set for adequate training. When your problem is less complicated or not much data is available, then a less complicated model will be the best choice. This process involves two things. First, you need models whose complexity is easily adjustable and second, you need a way to measure their performance. This chapter will discuss both of those things for some particular examples, with a particular focus on performance measurements. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access