Sequence Analysis in a Nutshell: A Guide to Tools
A Guide to Common Tools and Databases
By Scott Markel, Darryl León
January 2003
Pages: 302
Series: In a Nutshell
ISBN 10: 0-596-00494-X |
ISBN 13: 9780596004941




(Average of 1 Customer Reviews)


Book description
Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases pulls together all of the vital information about the most commonly used databases, analytical tools, and tables used in sequence analysis. The book contains details and examples of the common database formats (GenBank, EMBL, SWISS-PROT) and the GenBank/EMBL/DDBJ Feature Table Definitions. It also provides the command line syntax for popular analysis applications such as Readseq and MEME/MAST, BLAST, ClustalW, and the EMBOSS suite, as well as tables of nucleotide, genetic, and amino acid codes. Written in O'Reilly's enormously popular, straightforward "Nutshell" format, this book draws together essential information for bioinformaticians in industry and academia, as well as for students. If sequence analysis is part of your daily life, you'll want this easy-to-use book on your desk.
Full Description
Gene sequence data is the most abundant type of data available, and if you're interested in analyzing it, you'll find a wealth of computational methods and tools to help you. In fact, finding the data is not the challenge at all; rather it is dealing with the plethora of flat file formats used to process the sequence entries and trying to remember what their specific field codes mean. If you survive by surrounding yourself with well-thumbed hard copies of readme files or remembering exactly where to look for the details when you need them, then Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases is for you. This book is a handy resource, as well as an invaluable reference, for anyone who needs to know about the practical aspects and mechanics of sequence analysis.
Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases pulls together all of the vital information about the most commonly used databases, analytical tools, and tables used in sequence analysis. The book is partitioned into three fundamental areas to help you maximize your use of the content. The first section, "Databases" contains examples of flatfiles from key databases (GenBank, EMBL, SWISS-PROT), the definitions of the codes or fields used in each database, and the sequence feature types/terms and qualifiers for the nucleotide and protein databases.
The second section, "Tools" provides the command line syntax for popular applications such as ReadSeq, MEME/MAST, BLAST, ClustalW, and the EMBOSS suite of analytical tools. The third section, "Appendixes" concentrates on information essential to understanding the individual components that make up a biological sequence. The tables in this section include nucleotide and protein codes, genetic codes, as well as other relevant information.
Written in O'Reilly's enormously popular, straightforward "Nutshell" format, this book draws together essential information for bioinformaticians in industry and academia, as well as for students. If sequence analysis is part of your daily life, you'll want this easy-to-use book on your desk.
Browse within this book
Cover
| Table of Contents
| Colophon
Featured customer reviews

Sequence Analysis in a Nutshell Review,
November 21 2003
Submitted by Patrick Fleury
[
Respond |
View]
Title: Sequence Analysis in a Nutshell (SAIAN)
Authors: Markel, S and Leon, D.
Publisher: O'Reilly and Associates
Year: 2003
The basic idea behind sequence analysis is the classification of DNA or protein sequences in terms of other known DNA or protein sequences. To take a simple case, suppose there is a laboratory team that decodes a section of human - or mouse or rat - DNA and finds it corresponds to a sequence of letters, perhaps something like AGTTCGATTGATTGCA. (This is a fairly small sequence.) The team might want to find out is what is already known about this particular sequence. To do this, they would compare their sequence to a known database of sequences.
This database searching is not a trivial matter because, not only would they want to find out if there are any exact matches for their sequence, they might also want to find out if there are any approximate matches. Here, approximate takes on a new meaning because it not only means sequences that share a large number of exact matches, but also sequences where parts of their sequence appear separated by other letters. For example, if you consider the above sequence, it appears in the sequence
ATAGTAATTCGAGCTTTGAATTTTGCA
except that there are a few other letters interspersed within it. Or, they might be happy to find a sequence like the above except some of the letters have been transposed to other letters. For example, the sequence ATTTCGGTAGATGCA is the above sequence with a couple of random letter changes.
Such alignments, although highly unintuitive to the uninitiated, might be useful to the biological researcher.
The team might also want to search not only databases of human DNA but also mouse DNA, rat DNA or perhaps even the worm, C Elegans.
I could go on with this, but I am merely trying to convince you that searching for one sequence among other sequences is not just a matter of bringing up a regular expression engine and letting it do its job. Instead, it's a very sophisticated process with lots of variations and parameters. Indeed a lot of work has gone into tweaking the particular types of algorithm to use in such searches. These algorithms have been codified into families with titles such as BLAST (Basic Local Alignment Search Tool) and BLAT (BLAST-Like Alignment Tool) and ClustalW and they are available in various places on the web.
This brings us to the volume under discussion. While it is possible to find out about these tools by searching the net, it would be useful to have one source that contained information about all of them in one easy to use format. This volume is that source.
This is another of O'Reilly's Nutshell series. Like the others in the series such as "Perl in a Nutshell", "C++ in a Nutshell" etc., the volume does not have as its main point the explication of the theory of sequence analysis. You will need to look elsewhere for that. Instead, it collects in one place a lot of information about the tools that are useful.
The first five chapters are devoted to clear descriptions of the common data formats you will run into in sequence analysis. These include, FASTA, SWISS-PROT, GenBank and some of their relatives.
The next few chapters are devoted to the tools that make these analyses work. Surprisingly, BLAST, one of the most popular of the search algorithms gets pretty short shrift. It only has about seven pages devoted to it. This might be due to the fact that O'Reilly recently published a book devoted entirely to BLAST. (There will be more about that later.)
The short space given to BLAST might also be because the authors wanted to save a lot of space for EMBOSS (European Molecular Biology Open Software Suite). EMBOSS is a suite of over 100 programs for sequence analysis that have been released as open source and whose code is available on the web. Anyone who wants to see real working C-code to perform sequence analysis matching would do well to down load these programs and study them. Markel and Leon devote almost 170 pages to this suite and all of its possible options and flags. By the way, the section on EMBOSS is really the only place a where a particular programming language appears in the book and it doesn't really appear because you need to download the code to see it. There is no Perl in "SAIAN".
Besides data formats and descriptions of tools, the book also has some other useful parts. For example, it has appendices devoted to amino acid and nucleotide tables, and genetic codes. It also lists a lot of websites where interested parties can go to find more information.
This book looks useful for anyone who would like to have good single reference for sequence analysis tools.
All of the above notwithstanding, the book is a manual and sometimes reading it is just like reading a Unix Man page. It may be informative, but, if you really want to know what is going on, you may need to look elsewhere for some further explanation. In particular, the treatment of BLAST in "SAIAN" does not really tell you what is going on. I would be much harder on "SAIAN" were it not for the fact the O'Reilly recently published another book titled simply "BLAST".
"BLAST", which was written by Ian Korff, Mark Yandell and Joseph Bedell, is subtitled "An Essential Guide to the Basic Local Alignment Search Tool" and it is indeed that. It contains not only a detailed introduction to BLAST, but also a short introduction to the theory behind BLAST. If you want to find out a little bit about basic genetics and how BLAST works into sequence alignment, you could do a lot worse then read this book. It goes through the algorithms in some detail and actually shows you some elementary Perl code to carry out some of the algorithms. Furthermore, it contains an introduction to some of the statistical methods behind the code. (If you want to go deeply into the theory behind the algorithms, I recommend the book by Durbin, Krogh, etc referenced at the end of this review.)
In summary, "Sequence Analysis in a Nutshell" is a useful tool.
It collects in one place common data formats.
It also collects references to common algorithms such as BLAST and BLAT.
It has a large section on EMBOSS.
It has appendices on genetic codes and nucleotides.
It has a lot of references to URLS for finding more information and for downloading code.
It does not have enough about BLAST but, the book called "BLAST", also from O'Reilly, provides a very good reference for that tool along with other more theoretical information.
Finally, I want to point out the animal on the cover of SAIAN works as symbolism on several levels. It is a liger a cross between a male lion and a female tiger. (A cross between a male tiger and a female lion is called a tigon. Ah, the wonderful things you learn from reading the colphon of an O'Reilly book.) It is not only fitting that such a mixture of genes be on the cover of this book but it is nice to note that the authors work for LION bioscience.
Patrick Fleury
Books referenced in the above
Durbin, R. Eddy, S., Krogh, A. and Mitchison, G. 1998, Biological Sequence Analysis, New York: Cambridge University Press
Korf, I, Yandell, M. and Bedell, J., 2003, BLAST, Sebastopol: O'Reilly
Markel, S. and Leon, D., 2003, Sequence Analysis in a Nutshell, Sebastopol: O'Reilly
Read all reviews
Media reviews
"'Sequence Analysis in a Nutshell' provides a compact compilation of all the manual pages for commonly used sequence-analysis tools. This information is presented in a comfortable format with readable, informative fonts and a useful index...For readers who frequently refer to the manual pages or help pages, having this information available in a book on one's lap in a pleasant, readable font is a nice improvement."
--Kim Worley,
American Journal of Human Genetics, November 2003
"If you find yourself confused as to which EMBOSS program to use for finding restriction enzyme sites in DNA or are puzzled by a list of BLAST command-line options, this book can help you...This easy-to-use volume is helpful to students, bioinformaticians, and academics who need a reference tool for sequence analysis often."
--"Genetic Engineering News," July 2003
"So what lifts this book above the level of Google searches? Firstly, the authors have done the hard work of gathering surprisingly scattered chunks of information together in one mass—a neat, glossy mass which should fit easily on a shelf near your desk. Secondly, their work is packaged and produced to the usual high O'Reilly standard of typesetting and layout: the text is clear, consistent and tasteful (with a striking cover image of a liger). Thirdly, by the simple act of making an informed selection, Markel and Leon, have served the field by more clearly defining the
de facto standard bioinformatics standards and systems.This reference is sensibly aimed at the generalist, possibly in a commercial, administrative or service bioinformatics role who just needs to get things done. 'The liger book' would also be especially useful to relatively inexperienced bioinformaticians or ones only superficially familiar with the tools it covers, for example, students tackling a research project. Both groups in particular would find it a handy 'meta tool' to help themselves and help others.'
--
Damian Counsell, UK Unix Users' Group Newsletter, June 2003
Read all reviews