How it works...

In this recipe, we performed equal-width discretization, that is, we sorted the variable values into equidistant intervals. We arbitrarily defined the number of bins as 10 and then calculated the difference between the maximum and minimum value of the LSTAT variable, using the pandas max() and min() methods. With NumPy's floor() and ceil() methods, we obtained the rounded-down or rounded-up minimum and maximum values, respectively. We then estimated the interval length by dividing the value range, that is, the maximum minus the minimum values, by the number of bins. Finally, we captured the interval limits in a list, utilizing the minimum and maximum values, and the interval width within a list comprehension.

To discretize ...

Get Python Feature Engineering Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.