Skip to Content
Data Algorithms
book

Data Algorithms

by Mahmoud Parsian
July 2015
Intermediate to advanced
778 pages
17h 9m
English
O'Reilly Media, Inc.
Content preview from Data Algorithms

Appendix A. Bioset

Biosets (also called gene signatures1 or assays2) encompass data in the form of experimental sample comparisons (for transcriptomic, epigenetic, and copy-number variation data), as well as genotype signatures (for genome-wide association study [GWAS] and mutational data).

A bioset has an associated data type, which can be gene expression, protein expression, methylation, copy-number variation, miRNA, or somatic mutation. Also, each bioset entry/record has an associated reference type, which can be r1=normal, r2=disease, r3=paired, or r4=unknown. Note that a reference type does not apply to the somatic mutation data type.

The number of entries/records per bioset depends on its data type (see Table A-1).

Table A-1. Number of records per bioset data type
Bioset data type Number of entries/records
Somatic mutation 3,000–20,000
Methylation 30,000
Gene expression 50,000
Copy-number variation 40,000
Germline 4,300,000
Protein expression 30,000
miRNA 30,000

1 A gene signature is a group of genes in a cell whose combined expression pattern is uniquely characteristic of a biological phenotype or medical condition. The phenotypes that may theoretically be defined by a gene expression signature range from those that are used to differentiate between different subtypes of a disease, those that predict the survival or prognosis of an individual with a disease, to those that predict activation of a particular pathway. Ideally, gene signatures can be used ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Algorithms with Spark

Data Algorithms with Spark

Mahmoud Parsian
Graph Algorithms

Graph Algorithms

Mark Needham, Amy E. Hodler
Algorithms and Data Structures for Massive Datasets

Algorithms and Data Structures for Massive Datasets

Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert

Publisher Resources

ISBN: 9781491906170Errata PageSupplemental Content