March 2017
Beginner to intermediate
866 pages
18h 4m
English
In contrast to the heterogeneous domains and applications of machine learning, the data representation in scikit-learn is less diverse, and the basic format that many algorithms expect is straightforward—a matrix of samples and features.
The underlying data structure is a numpy and the ndarray. Each row in the matrix corresponds to one sample and each column to the value of one feature.
There is something like Hello World in the world of machine learning datasets as well; for example, the Iris dataset whose origins date back to 1936. With the standard installation of scikit-learn, you already have access to a couple of datasets, including Iris that consists of 150 samples, each consisting of four measurements ...