COMPUTING GENOMIC DISTANCES : AN ALGORITHMIC VIEWPOINT
34.1.1 What this Chapter is About
Comparative genomics is a field of bioinformatics in which the goal is to compare several species by comparing their genomes, to understand how the different species under study have evolved in time. This study leads, for instance, to reconstructing putative ancestral genomes, building phylogenetic trees, or inferring the functionality of genes or sets of genes.
One of the main activities of comparative genomics consists of comparing pairs of genomes to identify their common features and thus also to determine what differentiates them. In that case, genomes usually are modeled as sequences of genes in which a gene is identified by a (possibly signed) label. The sign + or –, if present, indicates on which DNA strand the gene lies. In that context, the order of the genes in the studied genomes is the main information we are given. Note that the way this order was obtained is out of our scope here; only the order itself is taken into account.
It also should be noted that genomes may contain several occurrences of the same gene (possibly carrying different signs if signs are present). In that case, we say that a genome contains duplicates. Indeed, genes may be duplicated during evolution, and duplicate genes actually occur frequently in all living species.
Comparing pairs of genomes on that basis can be done roughly in two different ...