How it works...
In the literature and industry, it has been determined that the most frequent N-grams are also the most informative ones for a malware classification algorithm. For this reason, in this recipe, we will write functions to extract them for a file. We start by importing some helpful libraries for our extraction of N-grams (step 1). In particular, we import the collections library and the ngrams library from nltk. The collections library allows us to convert a list of N-grams to a frequency count of the N-grams, while the ngrams library allows us to take an ordered list of bytes and obtain a list of N-grams. We specify the file we would like to analyze and write a function that will read all of the bytes of a given file (steps ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access