book

精通数据科学算法

Name: 精通数据科学算法
ISBN: 9781836204596

by Posts & Telecom Press, David Natingga

May 2024

Intermediate to advanced

181 pages

3h 9m

Chinese

Packt Publishing

Read now

Unlock full access

版权信息
版权声明
内容提要
作者简介
致谢
评阅者简介
前言
资源与支持
第1章用k最近邻算法解决分类问题
1.1 Mary对温度的感觉1.2 实现k最近邻算法1.3 意大利地区的示例——选择k值1.4 房屋所有权——数据转换1.5 文本分类——使用非欧几里德距离1.6 文本分类——更高维度的k-NN1.7 小结1.8 习题
第2章朴素贝叶斯
2.1 医疗检查——贝叶斯定理的基本应用2.2 贝叶斯定理的证明及其扩展2.3 西洋棋游戏——独立事件2.4 朴素贝叶斯分类器的实现2.5 西洋棋游戏——相关事件2.6 性别分类——基于连续随机变量的贝叶斯定理2.7 小结2.8 习题

第3章决策树
3.1 游泳偏好——用决策树表示数据3.2 信息论3.3 ID3算法——构造决策树3.4 用决策树进行分类3.5 小结3.6 习题
第4章随机森林
4.1 随机森林算法概述4.2 游泳偏好——随机森林分析法4.3 随机森林算法的实现4.4 下棋实例4.5 购物分析——克服随机数据的不一致性以及度量置信水平4.6 小结4.7 习题
第5章 k-means聚类
5.1 家庭收入——聚类为k个簇5.2 性别分类——聚类分类5.3 k-means聚类算法的实现5.4 房产所有权示例——选择簇的数量5.5 小结5.6 习题
第6章回归分析
6.1 华氏温度和摄氏温度的转换——基于完整数据的线性回归6.2 根据身高预测体重——基于实际数据的线性回归6.3 梯度下降算法及实现6.4 根据距离预测飞行时长6.5 弹道飞行分析——非线性模型6.6 小结6.7 习题
第7章时间序列分析
7.1 商业利润——趋势分析7.2 电子商店的销售额——季节性分析7.3 小结7.4 习题
附录A 统计
A.1 基本概念A.2 贝叶斯推理A.3 分布A.4 交叉验证A.5 A/B测试
附录B R参考
B.1 介绍B.2 数据类型B.3 线性回归
附录C Python参考
C.1 介绍C.2 数据类型C.3 控制流
附录D 数据科学中的算法和方法术语

Content preview from 精通数据科学算法

第3章　决策树

决策树是数据在树状结构中的排列，根据节点处属性值的不同，数据将被分到不同的分支中。

本章将使用一个标准的ID3学习算法来构建一个决策树，该算法选择一个数据的属性，以最大化信息增益（一种基于信息熵的度量）为目标对数据样本进行分类。

本章将学习以下内容：

什么是决策树，以及如何将“游泳偏好”例子中的数据用决策树表示；
首先从理论角度说明信息论中信息熵和信息增益的概念，随后将其实际应用于“游泳偏好”例子中；
用Python实现一个ID3算法，并从数据训练开始构造一个决策树；
如何使用在“游泳偏好”例子中构建的决策树来对新的数据项进行分类；
如何使用决策树替代第2章西洋棋游戏中的分析方法，以及两种算法所得的结果有哪些差异；
加深读者对何时使用决策树作为分析方法的理解；
在“购物”例子中，如何处理在建立决策树过程中数据不一致的问题。

3.1　游泳偏好——用决策树表示数据　

例如，人们可能会对何时游泳有一定的偏好。偏好结果记录在表3-1中：

表3-1

泳衣	水温	游泳偏好
None	Cold	No
None	Warm	No
Small	Cold	No
Small	Warm	No
Good	Cold	No
Good	Warm	Yes

这个表中的数据可以用图3-1所示的决策树分支表示。

$C:\Users\LL\Desktop\49816\未命名-1-web-resources\image\3-1.png$

图3-1

在根节点有这么一个问题：是否有泳衣？问题的答案将可用数据分成3组，每组有两行记录。如果属性“泳衣”为none，则属性“游泳偏好”为no。不需要进一步知道关于水温的偏好了，因为所有属性“泳衣”为none的样本将被分类为no。属性“泳衣”为“small”的情况也是如此。在“泳衣”为“合适”的情况下，剩下的两行记录可以分为两类：no和yes。 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781836204596

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

精通数据科学算法

by Posts & Telecom Press, David Natingga

第3章　决策树

3.1　游泳偏好——用决策树表示数据

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.