Appendix A. Bioset
Biosets (also called gene signatures1 or assays2) encompass data in the form of experimental sample comparisons (for transcriptomic, epigenetic, and copy-number variation data), as well as genotype signatures (for genome-wide association study [GWAS] and mutational data).
A bioset has an associated data type, which can be gene expression, protein expression, methylation, copy-number variation, miRNA, or somatic mutation. Also, each bioset entry/record has an associated reference type, which can be r1=normal, r2=disease, r3=paired, or r4=unknown. Note that a reference type does not apply to the somatic mutation data type.
The number of entries/records per bioset depends on its data type (see Table A-1).
| Bioset data type | Number of entries/records |
|---|---|
| Somatic mutation | 3,000–20,000 |
| Methylation | 30,000 |
| Gene expression | 50,000 |
| Copy-number variation | 40,000 |
| Germline | 4,300,000 |
| Protein expression | 30,000 |
| miRNA | 30,000 |
1 A gene signature is a group of genes in a cell whose combined expression pattern is uniquely characteristic of a biological phenotype or medical condition. The phenotypes that may theoretically be defined by a gene expression signature range from those that are used to differentiate between different subtypes of a disease, those that predict the survival or prognosis of an individual with a disease, to those that predict activation of a particular pathway. Ideally, gene signatures can be used ...