Feature extraction and principal component analysis

Sometimes we have an overwhelming number of columns and likely not enough rows to handle the great quantity of columns.

A great example of this is when we were looking at the send cash now example in our Naïve Bayes example. We had literally 0 instances of texts with that exact phrase, so instead we turned to a naïve assumption that allowed us to extrapolate a probability for both of our categories.

The reason we had this problem in the first place is because of something called the curse of dimensionality.

The curse of dimensionality basically says that as we introduce and consider new feature columns, we need almost exponentially more rows (data points) in order to fill in the empty spaces that ...

Get Principles of Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.