- Iris data: This dataset is arguably the most classic dataset used in machine learning and maybe all of statistics. It is a dataset that measures sepal length, sepal width, petal length, and petal width of three different types of iris flowers: Iris setosa, Iris virginica, and Iris versicolor. There are 150 measurements overall, which means that there are 50 measurements of each species. To load the dataset in Python, we will use scikit-learn's dataset function, as follows:
from sklearn import datasets iris = datasets.load_iris() print(len(iris.data)) 150 print(len(iris.target)) 150 print(iris.data[0]) # Sepal length, Sepal width, Petal length, Petal width [ 5.1 3.5 1.4 0.2] print(set(iris.target)) # I. setosa, I. virginica, ...