算法的数学原理
21
算法的
数学原理
巨大的变化。对于一个给定的程序和一个给定的值,最坏执行时间就是处理所有规模
n
的数据所需要的最长执行时间。
之所以关注算法的最坏情况,是因为它通常是最容易分析的情况。此外,它还能够说明
程序在各种场景下到底会有多慢。
更正式地说,如果
S
n
是所有规模为
n
的问题样本
s
i
构成的集合,
t
()
代表算法对于每一
个问题样本所需要的执行时间,那么算法在最坏情况下的执行时间为
t
(
s
i
)
对于所有
s
i
S
n
的最大值。我们将
S
n
最坏情况的性能记
T
wc
(
n
)
T
wc
(
n
)
的增长率定义了算
法在最坏情况下的复杂度。
一般来说,通过计算每一份数据
s
i
来判定
算法在哪份数据上表现最坏,这种做法是不切
实际的——没有足够的资源。相反,算法分析人员会精心设计出能让算法的性能落入最
坏情况的问题样本。
2.3.2 平均情况
考虑设计一个支持
n
个电话的电话系统,其中
n
是一个非常大的数——要求在最坏情况
下,系统必须能够完成
n/
2
位用户同时呼叫另外
n/
2
位用户。虽然这个系统永远不会由
于过载而崩溃,但构造它还是需要花费很高的代价。因为现实生活中
n/
2
位用户同时
呼叫另外
n/
2
位用户发生的概率极小。相反,我们可以设计一个非常容易构建的系统,
并借助数学工具来计算由载导致的系统崩溃的概率。
对于规模为
n
的样本集合,我们用一个概率分布
Pr
{
s
i
}
表示样本分布的概率。其中,单
个样本的出现概率为
0
1
,所有样本的概率的和为
1
。更加正式一点的定义为,如果
S
n
是所有规模为
n
的样本集合,那么:
s
S
Pr s
i
=1
如果
t
()
表示算法在单个样本
的执行时间,那么在
S
n
上的平均执行时间是:
exceedingly small. Instead, we could design a system that is cheaper to build and use
mathematical tools to consider the probability of crash due to overload.
For the set of instances of size n, we associate a probability distribution Pr{s
i
}, which
assigns a probability between 0 and 1 to each instance s
i
such that the sum of the
probabilities over all instances of size n is 1. More formally, if S
n
is the set of instan
ces of size n, then:
s
i
S
n
Pr s
i
=1
If t() measures the work done by an algorithm on each instance, then the average-
case work done by an algorithm on S
n
is:
T
ac
n =
s
i
S
n
t s
i
Pr s
i
That is, the actual work done on instance s
i
, t(s
i
), is weighted with the probability
that s
i
will actually be presented as input. If Pr{s
i
} = 0, then the actual value of t(s
i
)
does not impact the expected work done by the program. Denoting this average-
case work on S
n
by T
ac
(n), then the rate of growth of T
ac
(n) defines the average-case
complexity of the algorithm.
Recall that when describing the rate of growth of work or time, we consistently
ignore constants. So when we say that Sequential Search of n elements takes,
on average:
1
2
n +
1
2
probes (subject to our earlier assumptions), then by convention we simply say that
subject to these assumptions, we expect Sequential Search will examine a linear
number of elements, or order n.
Best Case
Knowing the best case for an algorithm is useful even though the situation rarely
occurs in practice. In many cases, it provides insight into the optimal circumstance
for an algorithm. For example, the best case for Sequential Search is when it
searches for a desired value, v, which ends up being the first element in the list.
Consider a slightly different approach, which well call Counting Search, that
counts the number of times that v appears in a list. If the computed count is zero,
then the item was not found, so it returns
false; otherwise, it returns true. Note
that Counting Search always searches through the entire list; therefore, even
though its worst-case behavior is O(n)—the same as Sequential Search—its best-
Mathematics
of
Algorithms
Analysis in the Best, Average, and Worst Cases | 17
也就是说,样本
s
i
的实际执行时间
t
(
s
i
)
会与它作为输入数据的概率加权。如果
Pr
{
s
i
}=0
,那么
t
(
s
i
)
的实际值将不会影响程序的期望执行时间。我们用
T
ac
(
n
)
表示算法在
S
n
上的平均执行时间,那么
T
ac
(
n
)
的增长率则表示算法在平均情况下的复杂度。
回忆一下,在描述工作量和时间的增长率时,我们总是会忽略常数,因此顺序搜索
n

Get 算法技术手册(原书第2 版) now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.