Chapter 3. Reverse Complement of DNA: String Manipulation
The Rosalind REVC challenge explains that the bases of DNA form pairs of A-T and G-C. Additionally, DNA has directionality and is usually read from the 5'-end (five-prime end) toward the 3'-end (three-prime end). As shown in Figure 3-1, the complement of the DNA string AAAACCCGGT is TTTTGGGCCA. I then reverse this string (reading from the 3'-end) to get ACCGGGTTTT as the reverse complement.
Figure 3-1. The reverse complement of DNA is the complement read from the opposite direction
Although you can find many existing tools to generate the reverse complement of DNA—and I’ll drop a spoiler alert that the final solution will use a function from the Biopython library—the point of writing our own algorithm is to explore Python. In this chapter, you will learn:
-
How to implement a decision tree using a dictionary as a lookup table
-
How to dynamically generate a list or a string
-
How to use the
reversed()function, which is an example of an iterator -
How Python treats strings and lists similarly
-
How to use a list comprehension to generate a list
-
How to use
str.maketrans()andstr.translate()to transform a string -
How to use Biopython’s
Bio.Seqmodule -
That the real treasure is the friends you make along the way
Getting Started
The code and tests for this program are in the 03_revc directory. To get a feel for how ...