Chapter 3. Reverse Complement of DNA: String Manipulation

The Rosalind REVC challenge explains that the bases of DNA form pairs of A-T and G-C. Additionally, DNA has directionality and is usually read from the 5'-end (five-prime end) toward the 3'-end (three-prime end). As shown in Figure 3-1, the complement of the DNA string AAAACCCGGT is TTTTGGGCCA. I then reverse this string (reading from the 3'-end) to get ACCGGGTTTT as the reverse complement.

mpfb 0301
Figure 3-1. The reverse complement of DNA is the complement read from the opposite direction

Although you can find many existing tools to generate the reverse complement of DNA—and I’ll drop a spoiler alert that the final solution will use a function from the Biopython library—the point of writing our own algorithm is to explore Python. In this chapter, you will learn:

  • How to implement a decision tree using a dictionary as a lookup table

  • How to dynamically generate a list or a string

  • How to use the reversed() function, which is an example of an iterator

  • How Python treats strings and lists similarly

  • How to use a list comprehension to generate a list

  • How to use str.maketrans() and str.translate() to transform a string

  • How to use Biopython’s Bio.Seq module

  • That the real treasure is the friends you make along the way

Getting Started

The code and tests for this program are in the 03_revc directory. To get a feel ...

Get Mastering Python for Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.