CHAPTER 21Nearest Neighbours
Nearest neighbours algorithms are popular with data exploration and pattern recognition. In this chapter, we cover the foundation of the nearest neighbours techniques and present three particular algorithms. These methods are by definition model free. We only need to impose mild – or no – assumptions on the data. The outcome is fully derived from the data set itself. The inevitable consequence of this is the black-box nature of the model. We can barely introduce a narrative for the link between independent and dependent variables without looking into the raw data itself.
The nearest neighbour–based algorithms are quite successful in classification problems. They can be used for regression, but the power of the algorithm is not as good as it is with classification. In particular, high-dimensional problems are challenging for nearest neighbour–based regressions. A regression is thus better to be run with other methods discussed in this book, such as linear regression (with regularisation), trees/forests, or neural networks.
In this section, we present three specific algorithms based on the principles of nearest neighbours and implement them in q
:
- -nearest neighbours classifier
- Prototype clustering
- Feature selection based on the local nearest neighbourhood
21.1 k-Nearest Neighbours Classifier
The first method of the machine learning approach we present ...
Get Machine Learning and Big Data with kdb+/q now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.