Chapter 13. Location Restriction Sites: Using, Testing, and Sharing Code

A palindromic sequence in DNA is one in which the 5’ to 3’ base pair sequence is identical on both strands. For example, Figure 13-1 shows that the reverse complement of the DNA sequence GCATGC is the sequence itself.

mpfb 1301
Figure 13-1. A reverse palindrome is equal to its reverse complement

I can verify this in code:

>>> from Bio import Seq
>>> seq = 'GCATGC'
>>> Seq.reverse_complement(seq) == seq
True

As described in the Rosalind REVP challenge, restriction enzymes recognize and cut within specific palindromic sequences of DNA known as restriction sites. They typically have a length of between 4 and 12 nucleotides. The goal of this exercise is to find the locations in a DNA sequence of every putative restriction enzyme. The code to solve this problem could be massively complicated, but a clear understanding of some functional programming techniques helps to create a short, elegant solution. I will explore map(), zip(), and enumerate() as well as many small, tested functions.

You will learn:

  • How to find a reverse palindrome

  • How to create modules to share common functions

  • About the PYTHONPATH environment variable

Getting Started

The code and tests for this exercise are in the 13_revp directory. Start by copying a solution to the program revp.py:

$ cd 13_revp
$ cp solution1_zip_enumerate.py revp.py

Inspect ...

Get Mastering Python for Bioinformatics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.