Chapter 7. Classification Analysis

In the context of data analysis, the main idea of classification is the partition of a dataset into labeled subsets. If the dataset is a table in a database, then this partitioning could amount to no more than the addition of a new attribute (that is, a new table column) whose domain (that is, range of values) is a set of labels.

For example, we might have the table of 16 fruits shown in Table 7-1:

Classification Analysis

Figure 7-1. The meta-algorithm generates the algorithm

The last column, labeled Sweet, is a nominal attribute that can be used to classify fruit: either it's sweet or it isn't. Presumably, every fruit type that exists could ...

Get Java Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.