Skip to Content
Mastering Python for Bioinformatics
book

Mastering Python for Bioinformatics

by Ken Youens-Clark
May 2021
Intermediate to advanced
454 pages
10h 42m
English
O'Reilly Media, Inc.
Content preview from Mastering Python for Bioinformatics

Chapter 2. Transcribing DNA into mRNA: Mutating Strings, Reading and Writing Files

To express the proteins necessary to sustain life, regions of DNA must be transcribed into a form of RNA called messenger RNA (mRNA). While there are many fascinating biochemical differences between DNA and RNA, for our purposes the only difference is that all the characters T representing the base thymine in a sequence of DNA need to be changed to the letter U, for uracil. As described on the Rosalind RNA page, the program I’ll show you how to write will accept a string of DNA like ACGT and print the transcribed mRNA ACGU. I can use Python’s str.replace() function to accomplish this in one line:

>>> 'GATGGAACTTGACTACGTAAATT'.replace('T', 'U')
'GAUGGAACUUGACUACGUAAAUU'

You already saw in Chapter 1 how to write a program to accept a DNA sequence from the command line or a file and print a result, so you won’t be learning much if you do that again. I’ll make this program more interesting by tackling a very common pattern found in bioinformatics. Namely, I’ll show how to process one or more input files and place the results in an output directory. For instance, it’s pretty common to get the results of a sequencing run back as a directory of files that need to be quality checked and filtered, with the cleaned sequences going into some new directory for your analysis. Here the input files contain DNA sequences, one per line, and I’ll write the mRNA sequences into like-named files in an output directory. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Bioinformatics Programming Using Python

Bioinformatics Programming Using Python

Mitchell L Model

Publisher Resources

ISBN: 9781098100872Errata PageSupplemental Content