Skip to Content
数据科学中的实用统计学(第2版)
book

数据科学中的实用统计学(第2版)

by Peter Bruce, Andrew Bruce, Peter Gedeck
October 2021
Intermediate to advanced
289 pages
8h 31m
Chinese
Posts & Telecom Press
Content preview from 数据科学中的实用统计学(第2版)
回归与预测
131
本节要点
一个响应变量
Y
和多个预测变量
X
1
,
,
X
p
之间的关系可以使用多元线性回归进行
建模。
评价一个模型的最重要指标是均方根误差(
RMSE
)和
R
方(
R
2
)。
系数的标准误差可以用来测量变量对模型贡献的可靠性。
逐步回归是一种自动确定哪些变量应该包含在模型中的方法。
加权回归可以在拟合回归方程时给特定记录加上更大或更小的权重。
4.2.6
 扩展阅读
Gareth James
Daniela Witten
Trevor Hastie
Robert Tibshirani
的著作
An Introduction to
Statistical Learning
中对交叉验证与重抽样进行了精彩的论述。
4.3
 使用回归进行预测
在数据科学中,回归的基本目标是预测。我们应该牢记这一点,因为作为一种古老而又成
熟的统计方法,回归在很多时候仍然被视为传统的解释性建模工具,而非预测工具。
本节关键术语
预测区间
一个预测值两侧的不确定性区间。
外推
将模型扩展到其拟合所用的数据范围之外。
4.3.1
 外推风险
回归模型不应该外推到数据范围之外(除非使用回归进行时间序列预测),模型只对那些
有足够多数据值的预测变量有效(即使有足够多的数据,也会有其他问题,参见
4.6
)。
看一个极端的例子,假设我们要使用模型
model_lm
预测一个
5000
平方英尺的空荡荡的院
落的价值。在这个例子中,所有与建筑物相关的预测变量的值都是
0
,回归方程会得
到一
个荒谬的预测结果:
–521 900 + 5000
×
(–0.0605) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu

Publisher Resources

ISBN: 9787115569028