3
Biomarker-based prediction models: design and interpretation principles
This chapter will introduce key techniques and applications for patient classification and disease prediction based on multivariate data analysis and machine learning techniques, such as instance-based learning, kernel methods, random optimization, and graphical and statistical learning models. An analysis of prediction evaluation, model reporting and critical design issues will be provided. This chapter will also discuss feature selection for biomarker discovery.
3.1 Biomarker discovery and prediction model development
Disease classification and risk prediction models are typically based on multivariate statistical models involving different predictive factors. These models can be implemented with mathematical functions, non-parametric techniques, heuristic classification procedures and probabilistic prediction approaches. However, multi-biomarker prediction models may not always be strongly correlated with a disease or phenotype, or may not fully reflect inter-individual variability associated with the prediction output. Moreover, the directed incorporation of biomarkers from relatively well-studied functional pathways may introduce bias and may not account for the functional interdependence and diversity inherent in complex diseases.
Typical examples of clinical classification systems based on biomarkers are: classification of healthy vs. diseased patients, the classification of survival/death outcomes, ...