CHAPTER 20
ALGORITHMIC ISSUES IN THE ANALYSIS OF CHIP-SEQ DATA
20.1 INTRODUCTION
Researchers in biology and medicine nowadays have at their disposal enormous amounts of data and information, which provide an unprecedented opportunity to gain novel insights into the molecular basis of life and disease. The completion of several genome projects has given the (almost) complete DNA sequence of human and of several different organisms of interest, from viruses, to bacteria, to plants, to animals. This, in turn, has permitted the large-scale annotation of genes and their products, on the bricks of which life is built. Technologies like oligonucleotide microarrays, on the other hand, permit measuring the level of transcription of genes, that is, when and how much a given gene is activated according to developmental stage, cell cycle, external stimuli, disease, and so on. All in all, the emerging picture is that gene expression, that is, the series of steps in which a DNA region is transcribed into a RNA sequence, which in turn, is translated into a protein, is a process finely modulated at every stage by the cell. Thus, only when the regulation of this process also will be fully understood we will be able to obtain a complete picture of the mechanisms acting inside every living cell.
The first step of gene expression, the transcription of a DNA region into a complementary RNA sequence, is finely modulated and regulated by the activity of transcription ...