How to do it…

  1. Begin by reading in the pickled data:
import picklefile = open('CTU13Scenario1flowData.pickle', 'rb')botnet_dataset = pickle.load(file)
  1. The data is already split into train-test sets, and you only need assign these to their respective variables:
X_train, y_train, X_test, y_test = (    botnet_dataset[0],    botnet_dataset[1],    botnet_dataset[2],    botnet_dataset[3],)
  1. Instantiate a decision tree classifier with default parameters:
from sklearn.tree import *clf = DecisionTreeClassifier()
  1. Fit the classifier to the training data:
clf.fit(X_train, y_train)
  1. Test it on the test set:
clf.score(X_test, y_test)

The following is the output:

0.9991001799640072

Get Machine Learning for Cybersecurity Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.