Index

numbers

20NG

defined, 130

shrinkage testing, 156

A

About.com, 125

absolute URLs, 26

absorbing features, 185188

link-derived, 186187

from neighboring pages, 185188

textual, 185186

address resolution, 22

agglomerative clustering. See bottom-up clustering

Alexa Internet, 117

AltaVista, 67, 45, 239

Clever comparison, 239240

Mercator, 20

anchor text, 63, 227228

“ankle-deep semantics,” 289

approximate string matching, 66

Apte-Damerau-Weiss system, 174

Ask Jeeves, 303

aspect models, 109112, 198199

authorities

defined, 213

diffusion, unwanted, 234

multiple vectors, 216

rank correlation, 238

score, 213

score, precomputing and, 214

See also hubs

Automatic Resource Compilation (ARC) system, 238

average precision, 54

B

B&H algorithm, 225 ...

Get Mining the Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.