Chapter 7k-Nearest Neighbor Algorithm

  1. 7.1 Classification Task
  2. 7.2 k-Nearest Neighbor Algorithm
  3. 7.3 Distance Function
  4. 7.4 Combination Function
  5. 7.5 Quantifying Attribute Relevance: Stretching the Axes
  6. 7.6 Database Considerations
  7. 7.7 k-Nearest Neighbor Algorithm for Estimation and Prediction
  8. 7.8 Choosing k
  9. 7.9 Application of k-Nearest Neighbor Algorithm Using IBM/SPSS Modeler
    1. The R Zone
    2. Exercises
    3. Hands-On Analysis

7.1 Classification Task

Perhaps the most common data mining task is that of classification. Examples of classification tasks may be found in nearly every field of endeavor:

  • Banking: determining whether a mortgage application is a good or bad credit risk, or whether a particular credit card transaction is fraudulent
  • Education: placing a new student into a particular track with regard to special needs
  • Medicine: diagnosing whether a particular disease is present
  • Law: determining whether a will was written by the actual person deceased or fraudulently by someone else
  • Homeland security: identifying whether or not certain financial or personal behavior indicates a possible terrorist threat

In classification, there is a target categorical variable, (e.g., income bracket), which is partitioned into predetermined classes or categories, such as high income, middle income, and low income. The data mining model examines a large set of records, each record containing information on the target variable as well as a set of input or predictor variables. For example, consider the ...

Get Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.