Performing advanced analytics

You can find many different libraries for statistics, data mining, and machine learning in Python. Probably the best known one is the scikit-learn package. It provides most of the commonly used algorithms, and also tools for data preparation and model evaluation.

In scikit-learn, you work with data in a tabular representation by using pandas data frames. The input table (actually a two-dimensional array, not a table in the relational sense) has columns used to train the model. Columns, or attributes, represent some features, and therefore this table is also called the features matrix. There is no prescribed naming convention; however, in most of the Python code you will note that this features matrix is stored ...

