- For the Naive Bayes exercise, we use a famous dataset called iris.data, which can be obtained from UCI. The dataset was originally introduced in the 1930s by R. Fisher. The set is a multivariate dataset with flower attribute measurements classified into three groups.
In short, by measuring four columns, we attempt to classify a species into one of the three classes of Iris flower (that is, Iris Setosa, Iris Versicolor, Iris Virginica).
We can download the data from here:
The column definition is as follows:
- Sepal length in cm
- Sepal width in cm
- Petal length in cm
- Petal width in cm
- -- Iris Setosa => Replace it with 0
- -- Iris Versicolour => Replace it with 1