Tune me up!

You already know my pros and cons, I have a con that is, my classification accuracy is relatively low. However, if you tune me up, I can perform much better. Well, should we trust Naive Bayes? If so, shouldn't we look at how to increase the prediction performance of this guy? Let's say using the WebSpam dataset. At first, we should observe the performance of the NB model, and after that we will see how to increase the performance using the cross-validation technique.

The WebSpam dataset that downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/webspam_wc_normalized_trigram.svm.bz2 contains features and corresponding labels, that is, spam or ham. Therefore, this is a supervised machine learning problem, ...

Get Scala and Spark for Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.