Chapter 13. Location Restriction Sites: Using, Testing, and Sharing Code
A palindromic sequence in DNA is one in which the 5’ to 3’ base pair sequence is identical on both strands. For example, Figure 13-1 shows that the reverse complement of the DNA sequence GCATGC is the sequence itself.
Figure 13-1. A reverse palindrome is equal to its reverse complement
I can verify this in code:
>>> from Bio import Seq >>> seq = 'GCATGC' >>> Seq.reverse_complement(seq) == seq True
As described in the Rosalind REVP challenge, restriction enzymes recognize and cut within specific palindromic sequences of DNA known as restriction sites.
They typically have a length of between 4 and 12 nucleotides.
The goal of this exercise is to find the locations in a DNA sequence of every putative restriction enzyme.
The code to solve this problem could be massively complicated, but a clear understanding of some functional programming techniques helps to create a short, elegant solution.
I will explore map(), zip(), and enumerate() as well as many small, tested functions.
You will learn:
-
How to find a reverse palindrome
-
How to create modules to share common functions
-
About the
PYTHONPATHenvironment variable
Getting Started
The code and tests for this exercise are in the 13_revp directory.
Start by copying a solution to the program revp.py:
$ cd 13_revp $ cp solution1_zip_enumerate.py revp.py
Inspect ...