To find out what our problem is, we have to plot the train and test errors over different data sizes and then check whether the gap between train and test is closing.
High bias is typically revealed by the test error decreasing a bit at the beginning, but then settling at a very high value with the train error approaching with a growing dataset size. High variance is recognized by a big gap between both curves.
Plotting the errors for different dataset sizes for 5NN shows a big gap between train and test errors, hinting at a high variance problem:
As the test error does not decrease with more data, we have to rethink the ...