O'Reilly logo

Learning Predictive Analytics with R by Eric Mayor

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Working with k-NN in R

When explaining the way k-NN works, we have used the same data as training and testing data. The risk here is overfitting: there is noise in the data almost always (for instance due to measurement errors) and testing on the same dataset does not let us examine the impact of noise on our classification. In other words, we want to ensure that our classification reflects real associations in the data.

There are several ways to solve this issue. The most simple is to use a different set of data for training and testing. We have already seen this when discussing Naïve Bayes. Another, better, solution is to use cross-validation. In cross-validation, the data is split in any number of parts lower than the number of observations. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required