As you probably guessed, this is where CTPH comes into play. Essentially, we're aiming to calculate reset points with this technique. Reset points, in this case, are boundaries similar to the 4-byte boundaries we used in the prior example, as we use these reset points to determine the amount of a file we want to summarize. The notable exception is that we pick the boundaries based on file content (our context triggering) versus fixed windows. What this means is we use a rolling hash, as employed by ssdeep and spamsum, to calculate values throughout the file; when this specific value is found, a boundary line is drawn and the content since the prior boundary is summarized (the piecewise hash). In ...
Context Triggered Piecewise Hashing (CTPH)
Get Learning Python for Forensics - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.