O'Reilly logo

C# Machine Learning Projects by Yoon Hyup Hwang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preparation

Now that we have clearly stated and defined the problem that we are going to solve with ML, we need the data. No data, no ML. Typically, you need to take an extra step prior to the data preparation step to collect and gather the data that you need, but in this book we are going to use a pre-compiled and labeled dataset that is publicly available. In this chapter, we are going to use the CSDMC2010 SPAM corpus dataset (http://csmining.org/index.php/spam-email-datasets-.html) to train and test our models. You can follow the link and download the compressed data at the bottom of the web page. When you have downloaded and decompressed the data, you will see two folders named TESTING and TRAINING, and a text file named SPAMTrain.label ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required