CHAPTER 21Nearest Neighbours

Nearest neighbours algorithms are popular with data exploration and pattern recognition. In this chapter, we cover the foundation of the nearest neighbours techniques and present three particular algorithms. These methods are by definition model free. We only need to impose mild – or no – assumptions on the data. The outcome is fully derived from the data set itself. The inevitable consequence of this is the black-box nature of the model. We can barely introduce a narrative for the link between independent and dependent variables without looking into the raw data itself.

The nearest neighbour–based algorithms are quite successful in classification problems. They can be used for regression, but the power of the algorithm is not as good as it is with classification. In particular, high-dimensional problems are challenging for nearest neighbour–based regressions. A regression is thus better to be run with other methods discussed in this book, such as linear regression (with regularisation), trees/forests, or neural networks.

In this section, we present three specific algorithms based on the principles of nearest neighbours and implement them in q:

-nearest neighbours classifier
Prototype clustering
Feature selection based on the local nearest neighbourhood

21.1 k-Nearest Neighbours Classifier

The first method of the machine learning approach we present ...

Get Machine Learning and Big Data with kdb+/q now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Machine Learning and Big Data with kdb+/q by Jan Novotny, Paul A. Bilokon, Aris Galiotos, Frederic Deleze

CHAPTER 21Nearest Neighbours

21.1 k-Nearest Neighbours Classifier

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly