Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
简单而又奇妙的数值
19
2-10 Yelp 点评数据集中表示点评数量(输入)和评价星级评分(目标)关系的散点图。上图使
用原始点评数量,下图使用经对数变换后的点评数量
数据可视化的重要性
我们在两个不同的数据集中比较了对数变换的效果,并展示了数据可视化的
重要性。这里,我们特意使用了非常简单的输入特征和目标变量,以便非常
容易地对它们之间的关系进行可视化。像图
2-10
这样的图形可以立刻揭示
出,我们选择的模型(线性模型)不可能表示相应的输入和目标之间的关
系。另一方面,在给定平均星级评分的情况下,我们可以令人信服地做出点
评数量的分布模型。在构建模型时,使用可视化方法查看一下输入和输出之
间以及各个输入特征之间的关系是一种非常好的做法。
2.3.2
 指数变换
对数变换的推广
指数变换
是个变换族,对数变换只是它的一个特例。用统计学术语来说,它们都是
方差稳
定化变换
。要理解为什么方差稳定是个好性质,可以考虑一下泊松分布。泊松分布是一种
重尾分布,它的方差等于它的均值。因此,它的质心越大,方差就越大,重尾程度也越
大。指数变换可以改变变量的分布,使得方差不再依赖于均值。例如,假设一个随机变量
20
2
X
具有泊松分布,如果通过取它的平方根对它进行变换,那么
XX=
的方差就近似是一
个常数,而不是与均值相等。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680