We will revisit a problem that is detecting malicious URLs, and we will find a way to solve the same with decision trees. We start by loading the data:
from urlparse import urlparse import pandas as pd urls = pd.read_json("../data/urls.json") print urls.shape urls['string'] = "http://" + urls['string'](5000, 3)
On printing the head of the urls:
urls.head(10)
The output looks as follows:
pred |
string |
truth |
|
0 |
1.574204e-05 |
0 |
|
1 |
1.840909e-05 |
0 |
|
2 |
1.842080e-05 |
0 |
|
3 |
7.954729e-07 |
0 |
|
4 |
3.239338e-06 |
0 |
|
5 |
3.043137e-04 |