决策树和随机森林
83
print forest.validate(folds)
print "Calculating score for regression tree"
regression = MushroomRegression(data)
print regression.validate(folds)
Running this code shows the following output. In the code, actual is the data inside
of the training set. This is the data we hold to be true.
preds are results we got out of
the model we built:
Calculating score for decision tree
[preds 0 1
actual
0 844 0
1 0 781,
preds 0 1
actual
0 834 0
1 0 791,
preds 0 1
actual
0 814 0
1 0 811,
preds 0 1
actual
0 855 0
1 0 770,
preds 0 1
actual
0 861 0
1 0 763]
Calculating score for random forest method
[preds 0 1
actual
0 841 0
1 0 784,
preds 0 1
actual
0 869 0
1 0 756,
preds 0 1
actual
0 834 0
1 0 791,
preds 0 1
actual
0 835 0
1 0 790,
preds 0 1
actual
0 829 0
Pruning Trees | 81
1 0 795]
Calculating score for regression tree
[0.0, 0.0, 0.0, 0.0, 0.0]
What youll notice is, given this toy example, we are able to create a decision tree that
does exceptionally well. Does that mean we should go out to the woods and eat
mushrooms? No, but given the training data and information we gathered, we have
built a highly accurate model of mapping mushrooms to either poisonous or edible!
The resulting decision tree is actually quite fascinating as you can see in Figure 5-8.
Figure 5-8. e resulting tree from building decision trees
I dont think its important to discuss what this tree means, but it is interesting to
think of mushroom poisonousness as a function of a handful of decision nodes.
82 | Chapter 5: Decision Trees and Random Forests
你会注意到,对于这个小示例,我们能够创建一个非常好的决策树。
这是否意味着我
们应该去树林吃蘑菇?
不,但鉴于我们收集的训练数据和信息,我们已经建立了一个
高度准确地将蘑菇映射到有毒和可食用的模型!
所得到的决策树实际上是非常迷人的,如图
5-8
所示。
气味
总数
5-8:从构建决策树得到的结果树
我不认为讨论这个树的意义是重要的,但有趣的是将蘑菇毒性视为几个决策节点的一
个功能。
结论
在本章中,我们学习了如何使用决策树对数据进行分类。
这对于进行层次分类和某些
属性确定分割点很有用。
我们展示了决策树和随机森林都非常适合分类蘑菇的可食用
性。
但请记住不要在野外使用这个方法来分类蘑菇!
找一个真菌学家。

Get Python 机器学习实践:测试驱动的开发方法 now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.