The first step is to choose some data that will be used for classification. We have chosen some data from the UK Government data website at http://data.gov.uk/dataset/road-accidents-safety-data.
The dataset is called Road Safety - Digital Breath Test Data 2013, which downloads a zipped text file called DigitalBreathTestData2013.txt. This file contains around half a million rows. The data looks as follows:
Reason,Month,Year,WeekType,TimeBand,BreathAlcohol,AgeBand,GenderSuspicion of Alcohol,Jan,2013,Weekday,12am-4am,75,30-39,MaleMoving Traffic Violation,Jan,2013,Weekday,12am-4am,0,20-24,MaleRoad Traffic Collision,Jan,2013,Weekend,12pm-4pm,0,20-24,Female
In order to classify the data, we have modified both the column ...