Skip to Content
数据驱动力:企业数据分析实战
book

数据驱动力:企业数据分析实战

by Carl Anderson
April 2021
Intermediate to advanced
210 pages
6h 3m
Chinese
Posts & Telecom Press
Content preview from 数据驱动力:企业数据分析实战
186
附录
A
最小值的递减是对数线性的。这是一种无界的尾部极值的情况。说得更相关一些,对于最
小化问题,比如这里的情景匹配,就是要找一个下限,总而言之,要有一个完美匹配。比
如,可能有其他人站在同一个拍摄点拍了张景色相同的照片,但没有突兀的车。
我认为这就是
Norvig
的原理图所要表达的内容
。在特定的语料库规模下,我们已经找到了
相当不错的匹配,而扩大语料库的规模并不能改善结果。
综上所述,对于距离函数为非负的最近邻型最小化问题(这意味着代价函数的下界为零),
平均而言,该距离函数将随着数据或样本量的增加而单调递减。
A.2
 相对频率问题
第二类是
计数
问题或
相对频率
问题,这也是
Halevy
等人关注的重点。
Norvig
列出了几个
案例。细分的任务是需要将字符串(比如“
cheapdealsandstu
)分词成最有可能的单词
序列,这些字符串短到可以让我们对它们使用“暴力”方法进行可能的分词,但我们必须
评估每一种分词的可能性。最简单的做法统是假设单词的出现相互独立,也就是说,如果
Pr(
w
)
代表单词
w
在一些语料库中出现的频率,那么我们可以计算得出:
Pr(che,apdeals,andstuff) = Pr(che)
×
Pr(apdeals)
×
Pr(andstuff).
...
Pr(cheap,deals,and,stuff) = Pr(cheap)
×
Pr(deals)
×
Pr(and)
×
Pr(stuff).
我们当然也可以运用
N-grams
算法(比如使用
bigrams
):
Pr(
“cheap ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java数据科学实战

Java数据科学实战

Michael R. Brzustowicz, PhD
SQL经典实例(第2版)

SQL经典实例(第2版)

Anthony Molinaro, Robert de Graaf
机器学习流水线实战

机器学习流水线实战

Hannes Hapke, Catherine Nelson

Publisher Resources

ISBN: 9787115560179