Building Machine Learning Systems with Python - Third Edition
by Luis Pedro Coelho, Willi Richert, Matthieu Brucher
Mutual information
When looking at feature selection, we should not focus on the type of relationship as we did in the previous section (linear relationships). Instead, we should think in terms of how much information one feature provides, given that we already have another.
To understand this, let's pretend that we want to use features from the house_size, number_of_levels, and avg_rent_price feature sets to train a classifier that outputs whether the house has an elevator or not. In this example, we can intuitively see that, knowing house_size, we don't need to know number_of_levels anymore, as it contains, somehow, redundant information. With avg_rent_price, it's different because we cannot infer the value of rental space simply from the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access