第7章 分类分析
在数据分析的背景下,分类的主要思想是将数据集划分到标记好的子集中。如果数据集是数据库中的一张表,那么这种划分可能仅仅是增加了一个新属性(即表中的新列),它的域(即取值范围)是标签的集合。
例如表7-1展示的16种水果的表。
表7-1 水果示例的训练集
Name |
Size |
Color |
Surface |
Sweet |
---|---|---|---|---|
apple |
MEDIUM |
RED |
SMOOTH |
T |
apricot |
MEDIUM |
YELLOW |
FUZZY |
T |
banana |
MEDIUM |
YELLOW |
SMOOTH |
T |
cherry |
SMALL |
RED |
SMOOTH |
T |
coconut |
LARGE |
BROWN |
FUZZY |
T |
cranberry |
SMALL |
RED |
SMOOTH |
F |
fig |
SMALL |
BROWN |
ROUGH |
T |
grapefruit |
LARGE |
YELLOW |
ROUGH |
F |
kumquat |
SMALL |
ORANGE |
SMOOTH |
F |
lemon |
MEDIUM |
YELLOW |
ROUGH |
F |
orange |
MEDIUM |
ORANGE |
ROUGH |
T |
peach |
MEDIUM |
ORANGE |
FUZZY |
T |
pear |
MEDIUM |
YELLOW |
SMOOTH |
T |
pineapple |
LARGE |
BROWN |
ROUGH |
T |
pumpkin ... |
Get Java数据分析指南 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.