Skip to Main Content
Large Scale and Big Data
book

Large Scale and Big Data

by Sherif Sakr, Mohamed Gaber
June 2014
Intermediate to advanced content levelIntermediate to advanced
636 pages
23h 13m
English
Auerbach Publications
Content preview from Large Scale and Big Data
429Advanced Algorithms for Efcient Approximate Duplicate Detection
stabilize even at 3M records. This demonstrates that RSBF has much better con-
vergence rate than SBF.
Figure 13.7 similarly compares the difference in the number of 1s of successive
number of records for the synthetic data set. With 512 KB memory, the difference in
the number of 1s stabilizes to zero faster for RSBF (shortly after 50 million records)
as compared with SBF, which has not yet stabilized even at 455 million records.
|Data set| = 3,367,020, Memory = 2 KB, reshold FPR = 0.1
|Data set| = 3,367,020, Memory = 4 K
B, reshold FPR = 0.1
RSBF
SBF
RSBF
SBF
0.04
Difference in fraction ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Reinventing the Organization for GenAI and LLMs

Reinventing the Organization for GenAI and LLMs

Ethan Mollick
Big Data Analytics for Internet of Things

Big Data Analytics for Internet of Things

Tausifa Jan Saleem, Mohammad Ahsan Chishti
Scala:Applied Machine Learning

Scala:Applied Machine Learning

Pascal Bugnion, Patrick R. Nicolas, Alex Kozlov
Topics in Parallel and Distributed Computing

Topics in Parallel and Distributed Computing

Sushil K Prasad, Anshul Gupta, Arnold L Rosenberg, Alan Sussman, Charles C Weems

Publisher Resources

ISBN: 9781466581500