CHAPTER 2Retrieval of Protein Sequence from UniProtKB

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

2.1 INTRODUCTION

The Universal Protein Resource (UniProt) is a database of protein sequence and function, created by combining the Protein Information Resource‐Protein Sequence Database (PIR‐PSD), Swiss‐Prot, and TrEMBL databases. UniProt (www.uniprot.org/) has two sections: the Swiss‐Prot knowledgebase (it harbors fully annotated records) and the TrEMBL protein database (contains computationally analyzed records on proteins).

2.1.1 Features of UniProtKB/Swiss‐Prot

  • Non‐redundancy of records.
  • High level of integration of data deposited in different related databases (NCBI‐GenBank, EMBL, DDBJ for translated coding sequences).
  • High level of manual curation.
  • Contains more than 0.25 million entries.

2.1.2 Features of UniProtKB/TrEMBL

  • Translations of nucleotide coding sequence (cds) in EMBL/NCBI‐GenBank/DDBJ.
  • Automatic annotation.
  • Contains more than 3.3 million entries.

2.2 OBJECTIVE

To download the amino acid sequence of protein (say, taurine sex‐determining region, Y‐encoded (SRY) peptide).

2.3 PROCEDURE

  1. Open the Expert Protein Analysis System (ExPASy) homepage: http://www.expasy.org/
  2. Locate the browser on the drop‐down menu “Query all databases” at the upper center portion of the page, and click on “Proteomics” (to obtain information from all relevant databases, such as Prosite, String, ENZYME, UniProtKB etc), or else select UniProtKb ...

Get Basic Applied Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.