If you analyze the serial version of the k-nearest neighbors algorithm, you can find the following two points where you can parallelize the algorithm:
- The computation of the distances: Every loop iteration that calculates the distance between the input example and one of the examples of the train dataset is independent of the others
- The sort of the distances: Java 8 included the parallelSort() method in the Array class to sort arrays in a concurrent way
In the first concurrent version of the algorithm, we are going to create a task per distance between examples that we're going to calculate. We are also going to give the possibility to make a concurrent sort of arrays of distances. ...