March 2018
Beginner to intermediate
570 pages
13h 42m
English
Before we move on, we should talk about some of the limitations of k-NN.
First, if you're not careful to use an optimized implementation of k-NN, classification can be slow, since it requires the calculation of the test data point's distance to every other data point; sophisticated implementations have mechanisms for partially handling this.
Second, vanilla k-NN can perform poorly when the amount of predictor variables becomes too large. In the iris example, we used only two predictors, which can be plotted in two-dimensional space where the Euclidean distance is just the 2-D Pythagorean theorem that we learned in middle school. A classification problem with n predictors is represented in n-dimensional space; the Euclidean ...