138 Empirical Research in Software Engineering
the input attributes be numerical. The machine learning technique that can han-
dle heterogeneous data is DT. Thus, if our data is heterogeneous, then one may
apply DT instead of other machine learning techniques (such as support vector
machine, neural networks, andnearest neighbor methods).
2. Redundancy in the data: There may be some independent variables that are redun-
dant, that is, they are highly correlated with other independent variables. It is advis-
able to remove such variables to reduce the number of dimensions in the data set.
But still, sometimes it is found that the data contains the redundant information. In
this case, the researcher should make careful selection of the data analysis ...