Let's begin by instantiating a SelectKBest module. We will manually enter a k value, 5, meaning we wish to keep only the five best features according to the resulting p-values:
# keep only the best five features according to p-values of ANOVA testk_best = SelectKBest(f_classif, k=5)
We can then fit and transform our X matrix to select the features we want, as we did before with our custom selector:
# matrix after selecting the top 5 featuresk_best.fit_transform(X, y)# 30,000 rows x 5 columnsarray([[ 2, 2, -1, -1, -2], [-1, 2, 0, 0, 0], [ 0, 0, 0, 0, 0], ..., [ 4, 3, 2, -1, 0], [ 1, -1, 0, 0, 0], [ 0, 0, 0, 0, 0]])
If we want to inspect the p-values directly and see which columns were chosen, we can dive deeper into the ...