Python Utilities Functions

This appendix includes several Python utilities that are used throughout the book. The utilities were designed to simplify the use and improve the display of several often-used data mining methods. They are combined in the dmba package, that is available from the Python Package Index at https://pypi.org/project/dmba. Install the package using:

pip install dmba

The source code is available and maintained on https://github.com/gedeck/dmba.

  regressionSummary

import math
import numpy as np
from sklearn.metrics import regression
def regressionSummary(y_true, y_pred):
    """ print regression performance metrics 
    
    Input:
        y_true: actual values
        y_pred: predicted values
    """
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    y_res = y_true - y_pred
    metrics = [
        ('Mean Error (ME)', sum(y_res) / len(y_res)),
        ('Root Mean Squared Error (RMSE)', math.sqrt(regression.mean_squared_error(y_true, 
          y_pred))),
        ('Mean Absolute Error (MAE)', sum(abs(y_res)) / len(y_res)),
        ('Mean Percentage Error (MPE)', 100 * sum(y_res / y_true) / len(y_res)),
        ('Mean Absolute Percentage Error (MAPE)', 100 * sum(abs(y_res / y_true) / 
          len(y_res))),
    ]
    fmt1 = '{{:>{}}} : {{:.4f}}'.format(max(len(m[0]) for m in metrics))
    print('\nRegression statistics\n')
    for metric, value in metrics:
        print(fmt1.format(metric, value))

classificationSummary

from sklearn.metrics import classification def classificationSummary(y_true, ...

Get Data Mining for Business Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.