Chapter 9. Working with Range Data

Here is a problem related to yours and solved before. Could you use it? Could you use its result? Could you use its method?

How to Solve It George Pólya (1945)

Luckily for bioinformaticians, every genome from every branch of life on earth consists of chromosome sequences that can be represented on a computer in the same way: as a set of nucleotide sequences (genomic variation and assembly uncertainty aside). Each separate sequence represents a reference DNA molecule, which may correspond to a fully assembled chromosome, or a scaffold or contig in a partially assembled genome. Although nucleotide sequences are linear, they may also represent biologically circular chromosomes (e.g., with plasmids or mitochondria) that have been cut. In addition to containing nucleotide sequences (the As, Ts, Cs, and Gs of life), these reference sequences act as our coordinate system for describing the location of everything in a genome. Moreover, because the units of these chromosomal sequences are individual base pairs, there’s no finer resolution we could use to specify a location on a genome.

Using this coordinate system, we can describe location or region on a genome as a range on a linear chromosome sequence. Why is this important? Many types of genomic data are linked to a specific genomic region, and this region can be represented as a range containing consecutive positions on a chromosome. Annotation data and genomic features like gene models, genetic ...

Get Bioinformatics Data Skills now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.