Index
numbers
20NG
defined, 130
shrinkage testing, 156
A
About.com, 125
absolute URLs, 26
from neighboring pages, 185–188
address resolution, 22
agglomerative clustering. See bottom-up clustering
Alexa Internet, 117
Mercator, 20
“ankle-deep semantics,” 289
approximate string matching, 66
Apte-Damerau-Weiss system, 174
Ask Jeeves, 303
aspect models, 109–112, 198–199
authorities
defined, 213
diffusion, unwanted, 234
multiple vectors, 216
rank correlation, 238
score, 213
score, precomputing and, 214
See also hubs
Automatic Resource Compilation (ARC) system, 238
average precision, 54
B
B&H algorithm, 225 ...
Get Mining the Web now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.