Implementing Bloom Filters in Python
Date: This event took place live on December 01 2015
Presented by: George T. Heineman
Duration: Approximately 90 minutes.
Questions? Please send email to
A fundamental operation in Information Retrieval is to look up information in a repository by a key value. This operation may be costly since it might access a database or make a network request to fetch the data, which may or may not be present in the repository. One way to improve efficiency is to use a probabilistic data structure known as a Bloom Filter to test whether an element is a member of a set. A Bloom filter can accurately determine when an element does not belong to the set, in which case the costly access operation can be prevented. Even more impressive, it operates in constant time with fixed storage independent of the number of elements in the set. However, Bloom Filters may return a false positive even when the element does not exist; fortunately this false positive rate is predictable.
I will implement a Bloom Filter in Python and demonstrate its predictable behavior using several examples.
About George Heineman
George Heineman is an Associate Professor of Computer Science at WPI. His research interests are in Software Engineering. He co-edited the 2001 book "Component-Based Software Engineering: Putting the Pieces Together". He was the Program Chair for the 2005 International Symposium on Component-Based Software Engineering.