Implementing fuzzy_hasher.py

This script was tested with both Python versions 2.7.15 and 3.7.1 and doesn't leverage any third-party libraries.

While we'll get to the internals of the fuzzy hashing algorithm, let's start our script as we have the others. We begin with our imports, all standard libraries that we've used before as shown in the following. We also define a set of constants on lines 36 through 47. Lines 37 and 38 define our signature alphabet, in this case all of the base64 characters. The next set of constants are used in the spamsum algorithm to generate the hash. CONTEXT_WINDOW defines the amount of the file we'll read for our rolling hash. FNV_PRIME is used to calculate the hash while HASH_INIT sets a starting value for our ...

Get Learning Python for Forensics - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.