Understanding the mathematics behind decision trees

The main goal in a decision tree algorithm is to identify a variable and classification on which one can give a more homogeneous distribution with reference to the target variable. The homogeneous distribution means that similar values of the target variable are grouped together so that a concrete decision can be made.

Homogeneity

In the preceding example, the first goal would be to find a parameter (out of four: Terrain, Rainfall, Groundwater, and Fertilizers) that results in a better homogeneous distribution of the target variable within those categories.

Without any parameter, the count of harvest type looks as follows:

Bumper	Moderate	Meagre
4	9	7

Let us calculate, for each parameter, ...

Get Python: Data Analytics and Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python: Data Analytics and Visualization by Phuong Vo.T.H, Martin Czygan, Ashish Kumar, Kirthi Raman

Understanding the mathematics behind decision trees

Homogeneity

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly