Skip to Content
数据科学中的实用统计学(第2版)
book

数据科学中的实用统计学(第2版)

by Peter Bruce, Andrew Bruce, Peter Gedeck
October 2021
Intermediate to advanced
289 pages
8h 31m
Chinese
Posts & Telecom Press
Content preview from 数据科学中的实用统计学(第2版)
138
4
4.4.3
 有序因子变量
有些因子变量的水平是有等级的,这样的变量称为
有序因子变量
,或者
有序分类变量
。例
如,贷款等级可以是
A
B
C
,等等
,每种等级都比前一等级风险更高。通常,有序因子
变量可以转换为数值并作为数值使用。例如,
BldgGrade
是一个有序因子变量。表
4-1
给出
了部分等级类型。尽管等级有特殊的含义,但数值可以由低到高排序,对应于房屋等级。
4.2
节中拟合了一个回归模型
house_lm
,其中
BldgGrade
是作为数值变量处理的。
4-1:房屋等级及其相应的数值
Value Description
1 Cabin
2 Substandard
5 Fair
10 Very good
12 Luxury
13 Mansion
将有序因子按照数值变量处理可以保留顺序中包含的信息,如果转换为因子,这种信息就
会丢失。
本节要点
因子变量需要转换为数值变量,才能在回归中使用。
对于有
P
个不同值的因子变量,最常用的编码方法是用
P
–1
个虚拟变量来表示它们。
对于有多个水平的因子变量,即使是在非常大的数据集中,也需要合并成一个水平
更少的变量。
有些因子变量的水平是有序的,可以用一个数值变量来表示它们。
4.5
 解释回归方程
在数据科学中,回归最重要的应用是预测因变量(结果变量)。不过,在某些情况下,深
刻理解回归方程,搞清楚预测变量与结果变量之间关系的本质也是非常有价值的。本节针
对回归方程的研究与解释提供了一些指导。
本节关键术语
相关变量
如果预测变量高度相关,就很难解释单个系数。
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu

Publisher Resources

ISBN: 9787115569028