Chapter 10. Improving Models and Data Extraction
Sometimes, no matter how good an algorithm is, it just doesn’t work. Or worse, it doesn’t pick up anything. Data can be quite noisy, and sometimes it’s just about impossible to figure out what went wrong. This chapter focuses on improving what you already have by either selecting better features, or transforming your features into a new set. We do this by monitoring metrics as they relate to either cross-validations or production monitoring.
This chapter will be somewhat of a smorgasbord when it comes to improving your models. That is because there are many ways of fixing models.
The Problem with the Curse of Dimensionality
As we’ve talked about before, the curse of dimensionality is a big problem with distance-based machine learning algorithms. Generally speaking, as the number of dimensions increases, the average distance also goes up. Take, for instance, the case in Figure 10-1, where we see a perfect sphere centered at 0,0,0.
Figure 10-1. In the case of three dimensions, the average distance is 1 because it is perfect
Everything is fine in three dimensions, but what if we project only onto two dimensions? What ends up happening is quite illuminating (see Figure 10-2).
Figure 10-2. In this case of dimension = 2, the average distance ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access