O'Reilly logo

C# Machine Learning Projects by Yoon Hyup Hwang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Feature engineering for email data

We briefly looked at word distributions for spam and ham emails in the previous step and there are a couple things that we noticed. First, a large number of the most frequently occurring words are commonly used words with out much meaning. For example, words like to, the, for, and a are commonly used words and our ML algorithms would not learn much from these words. These type of words are called stop words and are often ignored or dropped from the feature set. We will use NLTK's list of stop words to filter out commonly used words from our feature set. You can download the NLTK list of stop words from here: https://github.com/yoonhwang/c-sharp-machine-learning/blob/master/ch.2/stopwords.txt. One way to ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required