Skip to Content
Mastering Python for Bioinformatics
book

Mastering Python for Bioinformatics

by Ken Youens-Clark
May 2021
Intermediate to advanced
454 pages
10h 42m
English
O'Reilly Media, Inc.
Content preview from Mastering Python for Bioinformatics

Chapter 11. Finding a Protein Motif: Fetching Data and Using Regular Expressions

We’ve spent quite a bit of time now looking for sequence motifs. As described in the Rosalind MPRT challenge, shared or conserved sequences in proteins imply shared functions. In this exercise, I need to identify protein sequences that contain the N-glycosylation motif. The input to the program is a list of protein IDs that will be used to download the sequences from the UniProt website. After demonstrating how to manually and programmatically download the data, I’ll show how to find the motif using a regular expression and by writing a manual solution.

You will learn:

  • How to programmatically fetch data from the internet

  • How to write a regular expression to find the N-glycosylation motif

  • How to manually find the N-glycosylation motif

Getting Started

All the code and tests for this program are located in the 11_mprt directory. To begin, copy the first solution to the program mprt.py:

$ cd 11_mprt
$ cp solution1_regex.py mprt.py

Inspect the usage:

$ ./mprt.py -h
usage: mprt.py [-h] [-d DIR] FILE

Find locations of N-glycosylation motif

positional arguments:
  FILE                  Input text file of UniProt IDs 1

optional arguments:
  -h, --help            show this help message and exit
  -d DIR, --download_dir DIR 2 Directory for ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Bioinformatics Programming Using Python

Bioinformatics Programming Using Python

Mitchell L Model

Publisher Resources

ISBN: 9781098100872Errata PageSupplemental Content