November 2019
Intermediate to advanced
346 pages
9h 36m
English
We begin by downloading the dataset, and then reading it into data frames (Steps 1 and 2) for convenient examination and manipulation. Moving on, we place the dataset into arrays in preparation for machine learning (Steps 3 and 4). The dataset consists of several thousand feature vectors for phishing URLs. There are 30 features, whose names and values are tabulated here:
|
Attributes |
Values |
Column name |
|
Having an IP address |
{ 1,0 } |
has_ip |
|
Having a long URL |
{ 1,0,-1 } |
long_url |
|
Uses Shortening Service |
{ 0,1 } |
short_service |
|
Having the '@' symbol |
{ 0,1 } |
has_at |
|
Double slash redirecting |
{ 0,1 } |
double_slash_redirect |
|
Having a prefix and suffix |
{ -1,0,1 } |
pref_suf |
|
Having a subdomain ... |