November 2019
Intermediate to advanced
346 pages
9h 36m
English
An important motivation for this recipe is that we can't rely on the IP address as an identifier of the device, since this value can be spoofed. Consequently, we would like to analyze the traffic's high-level data, that is, the metadata and traffic statistics, rather than content, to determine whether the device belongs to the network. We begin by reading in the training and testing datasets. We go on to featurize these and perform a quick data exploration step by observing the classification labels (step 2). To feed these into our classifier, we convert these categorical labels into numerical ones to be used to train our machine learning classifier (step 3). Having featurized the data in step 4 and step 5, we instantiate, ...