Appendix A. Bioset
Biosets (also called gene signatures1 or assays2) encompass data in the form of experimental sample comparisons (for transcriptomic, epigenetic, and copy-number variation data), as well as genotype signatures (for genome-wide association study [GWAS] and mutational data).
A bioset has an associated data type, which can be gene expression, protein expression, methylation, copy-number variation, miRNA, or somatic mutation. Also, each bioset entry/record has an associated reference type, which can be r1
=normal, r2
=disease, r3
=paired, or r4
=unknown. Note that a reference type does not apply to the somatic mutation data type.
The number of entries/records per bioset depends on its data type (see Table A-1).
Bioset data type | Number of entries/records |
---|---|
Somatic mutation | 3,000–20,000 |
Methylation | 30,000 |
Gene expression | 50,000 |
Copy-number variation | 40,000 |
Germline | 4,300,000 |
Protein expression | 30,000 |
miRNA | 30,000 |
1 A gene signature is a group of genes in a cell whose combined expression pattern is uniquely characteristic of a biological phenotype or medical condition. The phenotypes that may theoretically be defined by a gene expression signature range from those that are used to differentiate between different subtypes of a disease, those that predict the survival or prognosis of an individual with a disease, to those that predict activation of a particular pathway. Ideally, gene signatures can be used ...
Get Data Algorithms now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.