
419Advanced Algorithms for Efcient Approximate Duplicate Detection
13.5.1.1 False-Positive Rate
A false positive, FP, occurs when a distinct element of the stream is reported as a
duplicate. Consider the FP of e
m+1
, the (m + 1)th element of the stream. The elements
of the stream are assumed to be uniformly drawn at random from a nite universe
Γ, with |Γ| = U.
Let P
unique
be the probability that e
m+1
has not occurred in the rst m elements of the
stream, and let e
m+1
hash to H = {h
1
, h
2
,…, h
k
} positions, where h
i
∈ [1, s] for the ith
Bloom lter. e
m+1
will be reported as a duplicate when all the bit positions in H are set
to 1 after the