4DEALING WITH LARGE NUMBERS OF FEATURES

Image

In the previous chapter, we talked about overfitting—that is, using too many features in a given setting. Having a large number of features may also cause issues with long computation times. This chapter is all about reducing the size of our feature set—in other words, dimension reduction.

Note that it’s not just a need to use fewer features; we also need to decide which features, or even which combinations of features, to use. We’ll cover principal component analysis (PCA), one of the best-known techniques for dealing with large values of p, which is based on forming new features by combining old ones and ...

Get The Art of Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.