Chapter 12. BLAST

In biological research, the search for sequence similarity is very important. For instance, a researcher who has isolated a potentially important DNA or protein sequence wants to know if it’s already been identified and characterized by another researcher. If it hasn’t, the researcher wants to know if it resembles any known sequence from any organism. This information can provide vital clues as to the role of the sequence in the organism under study. And when no such resemblance is found, it is evidence that the sequence may belong to a new class of genes or gene products.

The Basic Local Alignment Search Tool (BLAST) is one of the most popular software tools in biological research. It tests a query sequence against a library of known sequences in order to find similarity. BLAST is actually a collection of programs with versions for query-to-database pairs such as nucleotide-nucleotide, protein-nucleotide, protein-protein, nucleotide-protein, and more.

This chapter examines the output from the nucleotide-nucleotide version of the program, BLASTN . For simplicity’s sake, I’ll simply refer to it here as BLAST. The main goal of this chapter is to show how to write code to parse a BLAST output file using regular expressions. The code is simple and basic, but it does the job. Once you understand the basics, you can build more features into your parser or obtain one of the fancier BLAST output parsers that’s available via the Web. In either case, you’ll know enough about ...

Get Beginning Perl for Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.