Basic Applied Bioinformatics
by Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, Mir Asif Iquebal
CHAPTER 1Retrieval of Sequence(s) from the NCBI Nucleotide Database
CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
1.1 INTRODUCTION
The NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/nucleotide/) is an archive of gene, transcript, and fragments of genomic DNA sequences. It combines several online public repositories, including GenBank (the genetic sequence database of NIH), RefSeq (annotated, non‐redundant reference sequence from genomic, transcript and protein), TPA (third‐party annotated data on nucleotide sequences), and PDB (protein databank: a repository of 3D structures of proteins and nucleic acids). The International Nucleotide Sequence Database Collaboration (INSDC) maintains the liaison between the three major molecular data repositories – namely, NCBI, DDBJ, and EMBL – to share the nucleotide data present in any of those databanks.
A brief description of the NCBI databases has been given in Appendix A “NCBI Database: A Brief Account” at the end of this book.
1.2 COMPONENTS OF THE NCBI NUCLEOTIDE DATABASE
- GenBank: An annotated collection of all publicly available nucleotide and in silico translated protein sequences.
- EST database: Maintains expressed sequence tags (ESTs) and short, single‐pass reads (the sequence‐fragments/reads obtained by loading the reaction in a lane only once and, hence, obtained after analyzing the input sequence by the sequencer only once) from mRNA (cDNA).
- GSS database: A database of genome survey ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access