Context Triggered Piecewise Hashing (CTPH)

As you probably guessed, this is where CTPH comes into play. Essentially, we're aiming to calculate reset points with this technique. Reset points, in this case, are boundaries similar to the 4-byte boundaries we used in the prior example, as we use these reset points to determine the amount of a file we want to summarize. The notable exception is that we pick the boundaries based on file content (our context triggering) versus fixed windows. What this means is we use a rolling hash, as employed by ssdeep and spamsum, to calculate values throughout the file; when this specific value is found, a boundary line is drawn and the content since the prior boundary is summarized (the piecewise hash). In ...

Get Learning Python for Forensics - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.