book

数据驱动力：企业数据分析实战

Name: 数据驱动力：企业数据分析实战
Author: Carl Anderson
ISBN: 9787115560179

by Carl Anderson

April 2021

Intermediate to advanced

210 pages

6h 3m

Chinese

Posts & Telecom Press

Read now

Unlock full access

封面
扉页
版权
版权声明
O'Reilly Media, Inc.介绍
目录
中文版赞誉
前言
概要
读者对象
篇章结构排版约定
O’Reilly在线学习平台（O’Reilly Online Learning）
联系我们

Content preview from 数据驱动力：企业数据分析实战

186

｜

附录

最小值的递减是对数线性的。这是一种无界的尾部极值的情况。说得更相关一些，对于最

小化问题，比如这里的情景匹配，就是要找一个下限，总而言之，要有一个完美匹配。比

如，可能有其他人站在同一个拍摄点拍了张景色相同的照片，但没有突兀的车。

我认为这就是

Norvig

的原理图所要表达的内容

。在特定的语料库规模下，我们已经找到了

相当不错的匹配，而扩大语料库的规模并不能改善结果。

综上所述，对于距离函数为非负的最近邻型最小化问题（这意味着代价函数的下界为零），

平均而言，该距离函数将随着数据或样本量的增加而单调递减。

A.2

　相对频率问题

第二类是

计数

问题或

相对频率

问题，这也是

Halevy

等人关注的重点。

Norvig

列出了几个

案例。细分的任务是需要将字符串（比如“

cheapdealsandstu

”

）分词成最有可能的单词

序列，这些字符串短到可以让我们对它们使用“暴力”方法进行可能的分词，但我们必须

评估每一种分词的可能性。最简单的做法统是假设单词的出现相互独立，也就是说，如果

Pr(

)

代表单词

在一些语料库中出现的频率，那么我们可以计算得出：

Pr(che,apdeals,andstuff) = Pr(che)

Pr(apdeals)

Pr(andstuff).

...

Pr(cheap,deals,and,stuff) = Pr(cheap)

Pr(deals)

Pr(and)

Pr(stuff).

我们当然也可以运用

N-grams

算法（比如使用

bigrams

）：

Pr(

“cheap ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9787115560179

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

数据驱动力：企业数据分析实战

by Carl Anderson

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.