Chapter 7k-Nearest Neighbor Algorithm

  1. 7.1 Classification Task
  2. 7.2 k-Nearest Neighbor Algorithm
  3. 7.3 Distance Function
  4. 7.4 Combination Function
  5. 7.5 Quantifying Attribute Relevance: Stretching the Axes
  6. 7.6 Database Considerations
  7. 7.7 k-Nearest Neighbor Algorithm for Estimation and Prediction
  8. 7.8 Choosing k
  9. 7.9 Application of k-Nearest Neighbor Algorithm Using IBM/SPSS Modeler
    1. The R Zone
    2. Exercises
    3. Hands-On Analysis

7.1 Classification Task

Perhaps the most common data mining task is that of classification. Examples of classification tasks may be found in nearly every field of endeavor:

  • Banking: determining whether a mortgage application is a good or bad credit risk, or whether a particular credit card transaction is fraudulent
  • Education: placing a new student into a particular track with regard to special needs
  • Medicine: diagnosing whether a particular disease is present
  • Law: determining whether a will was written by the actual person deceased or fraudulently by someone else
  • Homeland security: identifying whether or not certain financial or personal behavior indicates a possible terrorist threat

In classification, there is a target categorical variable, (e.g., income bracket), which is partitioned into predetermined classes or categories, such as high income, middle income, and low income. The data mining model examines a large set of records, each record containing information on the target variable as well as a set of input or predictor variables. For example, consider the ...

Get Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.