Chapter 10k-Nearest Neighbor Algorithm

10.1 Classification Task

Perhaps the most common data mining task is that of classification. Examples of classification tasks may be found in nearly every field of endeavor:

  • Banking: Determining whether a mortgage application is a good or bad credit risk, or whether a particular credit card transaction is fraudulent.
  • Education: Placing a new student into a particular track with regard to special needs.
  • Medicine: Diagnosing whether a particular disease is present.
  • Law: Determining whether a will was written by the actual person deceased or fraudulently by someone else.
  • Homeland security: Identifying whether or not certain financial or personal behavior indicates a possible terrorist threat.

In classification, there is a target categorical variable, (e.g., income bracket), which is partitioned into predetermined classes or categories, such as high income, middle income, and low income. The data mining model examines a large set of records, each record containing information on the target variable as well as a set of input or predictor variables. For example, consider the excerpt from a data set shown in Table 10.1. Suppose that the researcher would like to be able to classify the income bracket of persons not currently in the database, based on the other characteristics associated with that person, such as age, gender, and occupation. This task is a classification task, very nicely suited to data mining methods and techniques.

Table ...

Get Data Mining and Predictive Analytics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.