Skip to Content
Big Data
book

Big Data

by Fei Hu
April 2016
Beginner to intermediate
463 pages
18h 53m
English
Auerbach Publications
Content preview from Big Data
102
Big Data: Storage, Sharing, and Security
performance of a deep web crawler. For instance, suppose that there are two data sources A
and B containing 1,000 and 100,000 documents, respectively. For crawling A, the cost caused
by a crawler is 5,000 (matched documents) with 100% coverage. Meanwhile, for B, there are
500,000 matched documents retrieved by the same crawler with 100% coverage. Then, the
performance of the crawler on A and B is identical since both overlapping rates from A and B
are 5.
More formally, given a document-term bipartite graph G =(D,Q, E), a set of queries Q
s
Q selected by a crawler forms a subgraph denoted by G
s
=(D
s
,Q
s
,E
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Big Data

Big Data

Bernard Marr
Big Data

Big Data

Kuan-Ching Li, Hai Jiang, Laurence T. Yang, Alfredo Cuzzocrea
Big Data

Big Data

Eglantine Schmitt
Big Data

Big Data

James Warren, Nathan Marz

Publisher Resources

ISBN: 9781498734875