Skip to Main Content
Python 机器学习实践:测试驱动的开发方法
book

Python 机器学习实践:测试驱动的开发方法

by Matthew Kirk
January 2018
Intermediate to advanced content levelIntermediate to advanced
211 pages
8h 31m
Chinese
China Machine Press
Content preview from Python 机器学习实践:测试驱动的开发方法
决策树和随机森林
73
信息理论基础知识:熵被用作确定描述性内容的一种方式。熵的典型例子是,如果
死亡谷中天气晴朗的概率为
100
%,则发布当天天气信息的熵将为
0
。信息不需要
编码,因为没有什么可以报告的。
高熵的另一个例子是设置一个复杂的密码。你使用的数字和字符的种类越多,熵越
高。属性也是如此。如果我们有很多可能的蘑菇气味,那么将有更高的熵。
基尼不纯度
不要与基尼系数混淆,基尼不纯度是一个概率测量。
它定义属性在显示中的可能性以
及错误的概率。
不纯度的公式是:
Information theory primer: entropy is used as a way of determin‐
ing just how descriptive bits are. A canonical example of entropy
would be that if it’s always sunny in Death Valley with a probability
of 100% then the entropy would be 0 to send information about
what the weather of the day was. The information doesn’t need to
be encoded since there’s nothing to report.
Another example of high entropy would be having a complex ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Mastering Python for Bioinformatics

Mastering Python for Bioinformatics

Ken Youens-Clark

Publisher Resources

ISBN: 9787111581666