8Binary Classification Techniques: An Application on Simulated and Real Bio-medical Data
This chapter investigates the performance of classification techniques for discrete variables associated with binomial outcomes. Specifically, various classification techniques are presented based on multivariate indices and on machine learning methods, while their distinctive ability is evaluated by using simulated data as well as real Greek medical data. The classification techniques are assessed by using criteria such as the area under the ROC curve, sensitivity and specificity. The classification techniques’ predictability as well as their results’ statistical significance are evaluated by using Monte Carlo cross-validation. The results show that specific classification techniques outperform all others in almost all the validity criteria for specific cases in terms of data distribution, the number of features and their range of measurement. Multivariable indices show better performance in the case of a small number of features with a narrow-scale range. The findings of this chapter aim to propose a useful methodology for selecting suitable techniques for predicting a person’s real binomial outcome, in the case of discrete features.
8.1. Introduction
The binary classification of living beings (e.g. healthy or unhealthy), based on the characteristics measured on a discrete scale, is an objective of many different scientific fields, such as medicine, psychometry and dietetics [CAR 83, ...
Get Data Analysis and Applications 3, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.