
Challenges in Crawling the Deep Web
117
queries. If all orders exhibit a strong correlation, RankingReward(q
j
) is closer to 0. Note that
ˆ
δ
j
is calculated by the estimator in [25] and c
j
consists of network communication and bandwidth
consume that can measured by f
j
.
4.5 Discussions and Conclusions
In deep web crawling a query returns multiple documents that result in duplicates. Reducing
this redundancy is a unique problem in deep web crawling, and the source of the challenges
in deep web crawling. The major cost is the network traffic which could be measured by the
number of queries for small data sources. For large data sources such as online social ...