5
Biomarkers and gene expression data analysis
This chapter will introduce gene expression data analysis in the context of biomarker discovery. Fundamental statistical concepts and problems for disease classification and prediction model design will be reviewed. This will be followed by a discussion of recent advances and applications in different medical application domains. The content will be guided by the following topics: (a) biomedical findings and clinical applications, (b) statistical and data mining methodologies applied, (c) strengths and limitations.
5.1 Introduction
Changes in gene expression can be measured by different types of techniques ranging from smaller to large-scale approaches, and differing in terms of their reliability and genome coverage: Northern blotting, real-time polymerase chain reaction (RT-PCR), serial analysis of gene expression (SAGE), multiplex PCR and different types of DNA microarrays. These tools allow the detection of differentially expressed genes, up- or down-regulated genes in relation to specific clinical conditions or functional pathways. These studies may also be expanded by follow-up or validation studies using additional gene expression data measured with alternative experimental platforms, or by the implementation of other ‘omic’ approaches, such as proteomic approaches. The large-scale acquisition of gene expression data has allowed the design of different biomarker models for diagnostic and prognostic applications in cancer, cardiovascular ...