Chapter 2. Transcribing DNA into mRNA: Mutating Strings, Reading and Writing Files

To express the proteins necessary to sustain life, regions of DNA must be transcribed into a form of RNA called messenger RNA (mRNA). While there are many fascinating biochemical differences between DNA and RNA, for our purposes the only difference is that all the characters T representing the base thymine in a sequence of DNA need to be changed to the letter U, for uracil. As described on the Rosalind RNA page, the program I’ll show you how to write will accept a string of DNA like ACGT and print the transcribed mRNA ACGU. I can use Python’s str.replace() function to accomplish this in one line:

>>> 'GATGGAACTTGACTACGTAAATT'.replace('T', 'U')
'GAUGGAACUUGACUACGUAAAUU'

You already saw in Chapter 1 how to write a program to accept a DNA sequence from the command line or a file and print a result, so you won’t be learning much if you do that again. I’ll make this program more interesting by tackling a very common pattern found in bioinformatics. Namely, I’ll show how to process one or more input files and place the results in an output directory. For instance, it’s pretty common to get the results of a sequencing run back as a directory of files that need to be quality checked and filtered, with the cleaned sequences going into some new directory for your analysis. Here the input files contain DNA sequences, one per line, and I’ll write the mRNA sequences into like-named files in an ...

Get Mastering Python for Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.