Skip to Content
数据科学中的实用统计学(第2版)
book

数据科学中的实用统计学(第2版)

by Peter Bruce, Andrew Bruce, Peter Gedeck
October 2021
Intermediate to advanced
289 pages
8h 31m
Chinese
Posts & Telecom Press
Content preview from 数据科学中的实用统计学(第2版)
202
6
能在过拟合与过平滑之间达到最佳平衡的
K
,通常是使用准确度指标来确定的,具体地说,
是在保留数据或验证数据上的准确度。因为
K
主要依赖于数据本身的特性,所以没有确定
最佳
K
的通用原则。对于高度结构化、噪声很小的数据来说,较小的
K
值效果最好。借用
信号处理社区中的术语来说,这种数据具有非常高的
信噪比
signal-to-noise
ratio
SNR
)。
通常具有较高信噪比的数据用于手写识别和语音识别的数据集。对于结构化较差的多噪声
数据(即低信噪比的数据,比如贷款数据),使用较大的
K
值比较合适。一般来说,
K
1
20
之间,通常使用奇数
K
值来避免出现两个类别数量一样的情况。
偏差与方差的权衡
过平滑与过拟合之间的对立是
偏差
方差权衡
的一个实例,这种权衡是统计模
型拟合中一个很普遍的问题。方差是指由于选择训练数据而造成的模型误差,
也就是说,如果你选择了一个不同的训练数据集,最后得到的模型就会有一
些差异。偏差则是指由于没有正确识别出基本的实际场景而造成的模型误差,
如果只是增加更多的训练数据,这种误差不会消失。如果一个灵活的模型过
拟合了,方差就会增大,你可以通过使用一个更简单的模型来减小方差,但
这样会使偏差增大,因为丢失了一些对真实的基本情形建模的灵活性。通常,
处理这种权衡的一种方法是使用交叉验证,参见
4.2.3
节以获取更多详细信息。
6.1.6
 
KNN
作为特征引擎
KNN
既简单又直观,所以非常受欢迎。从性能方面来说,
KNN
一般比不上那些更加高级
的分类技术。但是,在实际的模型拟合中,对于其他分类方法的某个阶段性过程,可以使 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu

Publisher Resources

ISBN: 9787115569028