Revisiting malicious URL detection with decision trees

We will revisit a problem that is detecting malicious URLs, and we will find a way to solve the same with decision trees. We start by loading the data:

 from urlparse import urlparse import pandas as pd urls = pd.read_json("../data/urls.json") print urls.shape urls['string'] = "http://" + urls['string'](5000, 3)

On printing the head of the urls:

urls.head(10)

The output looks as follows:

pred

string

truth

0

1.574204e-05

http://startbuyingstocks.com/

0

1

1.840909e-05

http://qqcvk.com/

0

2

1.842080e-05

http://432parkavenue.com/

0

3

7.954729e-07

http://gamefoliant.ru/

0

4

3.239338e-06

http://orka.cn/

0

5

3.043137e-04

http://media2.mercola.com/ ...

Get Hands-On Machine Learning for Cybersecurity now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.