86
4
4
-
7
:对于字符集上排好序的
26
个字母随机排列的排序结果
n
/ 插入排序 /s 归并排序 /s 快速排序 散列排序 堆排序 /s 快速排序 (BFPRT
4
( 三中值 )/s (
17
576
个桶 )/s minSize=
4
)/s
4,096 0.000029 0.000434 0.00039 0.000552 0.0012 0.0016
8,192 0.000058 0.000932 0.000841 0.001 0.0026 0.0035
16,384 0.000116 0.002 0.0018 0.0019 0.0056 0.0077
32,768 0.000237 0.0041 0.0039 0.0038 0.0123 0.0168
65,536 0.000707 0.0086 0.0085 0.0092 0.0269 0.0364
131,072 0.0025 0.0189 0.0198 0.0247 0.0655 0.0834
4
-
8
:在中值方法反例数据上进行中值排序的结果
n
/ 归并排序 /s 散列排序 堆排序 /s 快速排序 (BFPRT
4
快速排序
(
17
576
个桶 )/s minSize=
4
)/s ( 三中值 )/s
4,096 0.000505 0.000569 0.0012 0.0023 0.0212
8,192 0.0011 0.0010 0.0026 0.0050 0.0841
16,384 0.0023 0.0019 0.0057 0.0108 0.3344
32,768 0.0047 0.0038 0.0123 0.0233 1.3455
65,536 0.0099 0.0091 0.0269 0.0506 5.4027
131,072 0.0224 0.0283 0.0687 0.1151 38.0950
4.10 分析技术
在分析一种排序算法时,我们必须解释它在最好、最坏和平均情况下的性能(参见
2
章中)。平均情况一般是最难准确量化分析的,它不仅需要高级的数学技巧,还需要对
数据可能的部分有序性有着合理的假设。即便算法在平均情况下有着令人满意的表现,
它还是有可能无法适用于实际应用。对于本章中的每种排序算法,我们都分析了它们的
理论性能和实践中的真实性能。
计算机科学中的一个基本结论是:无论是在平均情况还是在最坏情况下,基于比较的
排序算法性能是无法超越
O(
n
log
n
)
的。这里简要地证明一下该结论。给定
n
个元素,
n
!
种排列。基于比较的排序算法相当于对应一棵二叉决策树。决策树的叶子就相当于一个
底层排列,每个排列至少在树中与一个叶子结点相对应。每条路径从根到叶子上的结点
就相当于一个比较序列。这棵树的
高度
就是从根到叶子的最长路径上的比较结点的数目,
例如,图
4-10
中的树的高度就是
5
,因为所有情况都只需要
5
次比较(尽管有四情况
只需要
4
次比较)。
排序算法
87
排序算法
构建一个二叉决策树,使得每个内部结点表示一个
a
i
a
j
的比较行为,叶子表示
n
!
的一
个排列。对一个含有
n
个元素的集合进行排序,从根结点开始向下遍历,对在每一个
中的表达式进行求值。如果表达式为真则向左遍历否则向右遍历。图
4-10
棵具有
4
个元素的决策树。
4-10:对 4 个元素进行排序的二叉决策树
这样的二叉决策树可以有很多。对于任意一棵对
n
个元素进行比较的二叉决策树,一定
可以计算它的最小高度,也就是说,肯定存在某些叶子结点,从根结点到达它们只需要
经过
h
个比较结点。考虑一个高度为
h
的完全二叉树,这棵树所有非叶结点都有左右
结点,那么这棵树总共包括
n
=
2
h
1
结点,高度
h
= log(
n
+
1
)
。如果这棵树不是完
全二叉树,即在某些特殊情况下是不平衡的,那么一定有
h
log (
n
+
1
)
任何具有
n
!
个叶子结点的二叉决策树至少含有
n
!
结点。因此,可以按照
h
=
log (
n
!)
来计算
任何一棵二叉决策树的高度。在这里,我们利用了对数的如下性质:
log(
a*b
) = log(
a
) +
log(
b
)
log(
x
y
) =
y
*
log(
x
)
is the number of comparison nodes in the longest path from the root to a leaf node;
for example, the height of the tree in Figure 4-10 is 5 because only five comparisons
are needed in all cases (although in four cases only four comparisons are needed).
Construct a binary decision tree where each internal node of the tree represents a
comparison a
i
a
j
and the leaves of the tree represent one of the n! permutations.
To sort a set of n elements, start at the root and evaluate the statements shown in
each node. Traverse to the left child when the statement is true; otherwise, traverse
to the right child. Figure 4-10 shows a sample decision tree for four elements.
Figure 4-10. Binary decision tree for ordering four elements
We could construct many different binary decision trees. Nonetheless, we assert that
given any such binary decision tree for comparing n elements, we can compute its
minimum height h—that is, there must be some leaf node that requires h compari
son nodes in the tree from the root to that leaf. Consider a complete binary tree of
height h in which all nonleaf nodes have both a left and right child. This tree con
tains a total of n = 2
h
1 nodes and height h = log (n + 1); if the tree is not complete,
it could be unbalanced in strange ways, but we know that h log (n + 1). Any
binary decision tree with n! leaf nodes already demonstrates that it has at least n!
nodes in total. We need only compute h = log (n!)to determine the height of any
such binary decision tree. We take advantage of the following properties of loga
rithms: log (a*b) = log (a) + log (b) and log (x
y
) = y*log (x).
h = log (n!) = log (n * (n 1) * (n 2) * ... * 2 * 1)
h > log (n * (n 1) * (n 2) * ... * n/2)
h > log ((n/2)
n/2
)
88 | Chapter 4: Sorting Algorithms
h > (n/2) * log (n/2)
h > (n/2) * (log (n) 1)
Thus, h > (n/2)*(log (n) – 1). What does this mean? Well, given n elements to be
sorted, there will be at least one path from the root to a leaf of size h, which means
an algorithm that sorts by comparison requires at least this many comparisons to
sort the n elements. Note that h is computed by a function f(n); here in particular,
f(n) = (1/2)*n*log (n)n/2, which means any sorting algorithm using comparisons
will require O(n log n) comparisons to sort.
References
Bentley, J. and M. McIlroy, “Engineering a sort function,Soware—Practice and
Experience, 23(11): 1249–1265, 1993, http://dx.doi.org/10.1002/spe.4380231105.
Blum, M., R. Floyd, V. Pratt, R. Rivest, and R. Tarjan, “Time bounds for selection,
Journal of Computer and System Sciences, 7(4): 448–461, 1973, http://dx.doi.org/
10.1016/S0022-0000(73)80033-9.
Cormen, T. H., C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms.
Third Edition. MIT Press, 2009.
Davis, M. and K. Whistler, “Unicode Collation Algorithm, Unicode Technical Stan
dard #10,” June 2015, http://unicode.org/reports/tr10/.
Gilreath, W., “Hash sort: A linear time complexity multiple-dimensional sort algo
rithm,” Proceedings, First Southern Symposium on Computing, 1998, http://
arxiv.org/abs/cs.DS/0408040.
Musser, D., “Introspective sorting and selection algorithms,Soware—Practice and
Experience, 27(8): 983–993, 1997.
Sedgewick, R., “Implementing Quicksort programs,Communications ACM, 21(10):
847–857, 1978, http://dx.doi.org/10.1145/359619.359631.
Trivedi, K. S., Probability and Statistics with Reliability, Queueing, and Computer Sci
ence Applications. Second Edition. Wiley-Blackwell Publishing, 2001.
Sorting
Algorithms
References | 89
*
* * * *
*
****
*

Get 算法技术手册(原书第2 版) now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.