7 k‐NEAREST NEIGHBORS (k‐NN)

In this chapter, we describe the k‐nearest neighbors algorithm that can be used for classification (of a categorical outcome) or prediction (of a numerical outcome). To classify or predict a new record, the method relies on finding “similar” records in the training data. These “neighbors” are then used to derive a classification or prediction for the new record by voting (for classification) or averaging (for prediction). We explain how similarity is determined, how the number of neighbors is chosen, and how a classification or prediction is computed. k‐NN is a highly automated data‐driven method. We discuss the advantages and weaknesses of the k‐NN method in terms of performance and practical considerations such as computational time.

k‐NN in JMP: k‐NN is only available in JMP Pro.

7.1 THE k‐NN CLASSIFIER (CATEGORICAL OUTCOME)

The idea in k‐nearest neighbors methods is to identify k records in the training dataset that are similar to a new record that we wish to classify. We then use these similar (neighboring) records to classify the new record into a class, assigning the new record to the predominant class among these neighbors. Denote by left-parenthesis x 1 comma x 2 comma ellipsis comma x Subscript p Baseline right-parenthesis the values of the predictors for this new record. We look for records in our training data that are similar or “near” the record to be classified in the predictor space (i.e., records that have values ...

Get Machine Learning for Business Analytics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.