
429Advanced Algorithms for Efcient Approximate Duplicate Detection
stabilize even at 3M records. This demonstrates that RSBF has much better con-
vergence rate than SBF.
Figure 13.7 similarly compares the difference in the number of 1s of successive
number of records for the synthetic data set. With 512 KB memory, the difference in
the number of 1s stabilizes to zero faster for RSBF (shortly after 50 million records)
as compared with SBF, which has not yet stabilized even at 455 million records.
|Data set| = 3,367,020, Memory = 2 KB, reshold FPR = 0.1
|Data set| = 3,367,020, Memory = 4 K
RSBF
SBF
RSBF
SBF
0.04
Difference in fraction ...