Selecting the best individual features

If we have a number of features, the problem of finding the best subset is a difficult task. It relates to solving the data mining problem itself, multiple times. As we saw in Chapter 4, Recommending Movies Using Affinity Analysis, subset-based tasks increase exponentially as the number of features increase. This exponential growth in the time needed is also true for finding the best subset of features.

One basic workaround to this problem is not to look for a subset that works well together, rather than just finding the best individual features. This univariate feature selection gives us a score based on how well a feature performs by itself. This is usually done for classification tasks, and we generally ...

Get Learning Data Mining with Python - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.