March 2018
Intermediate to advanced
816 pages
19h 35m
English
You can find many different libraries for statistics, data mining, and machine learning in Python. Probably the best known one is the scikit-learn package. It provides most of the commonly used algorithms, and also tools for data preparation and model evaluation.
In scikit-learn, you work with data in a tabular representation by using pandas data frames. The input table (actually a two-dimensional array, not a table in the relational sense) has columns used to train the model. Columns, or attributes, represent some features, and therefore this table is also called the features matrix. There is no prescribed naming convention; however, in most of the Python code you will note that this features matrix is stored ...