book

Big Data

April 2016

Beginner to intermediate

463 pages

18h 53m

English

Read now

Unlock full access

Content preview from Big Data

Challenges in Crawling the Deep Web

 117

queries. If all orders exhibit a strong correlation, RankingReward(q

) is closer to 0. Note that

is calculated by the estimator in [25] and c

consists of network communication and bandwidth

consume that can measured by f

4.5 Discussions and Conclusions

In deep web crawling a query returns multiple documents that result in duplicates. Reducing

this redundancy is a unique problem in deep web crawling, and the source of the challenges

in deep web crawling. The major cost is the network trafﬁc which could be measured by the

number of queries for small data sources. For large data sources such as online social ...

Bernard Marr

Kuan-Ching Li, Hai Jiang, Laurence T. Yang, Alfredo Cuzzocrea

Eglantine Schmitt

James Warren, Nathan Marz

ISBN: 9781498734875