102 Knowledge Discovery from Data Streams
stream.
Here, we discuss a somewhat different problem. Given a stream S of n
items t
1
, . . . , t
n
, find those items whose frequency is greater than φ ×N . The
frequency of an item i is f
i
= |{j|t
j
= i}|. The exact φ-frequent items comprise
the set {i|f
i
> φ × N}. Heavy hitters are in fact singleton items.
Suppose, φ = 0.5, i.e., we want to find a majority element. An algorithm to
solve this problem can be stated as follows: store the first item and a counter,
initialized to 1. For each subsequent item, if it is the same as the currently
stored item, increment the counter. If it differs, and the counter is zero, then
store the new item and set the counter to 1; else, decrement the counter. After
processing all items, ...