Blast by Ian Korf, Mark Yandell, Joseph Bedell The unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification This page was updated June 12, 2008. UNCONFIRMED errors and comments from readers: [47] Figure 3.6; Local Alignment: Smith-Waterman Error: Running Blast_example3-2.pl returns a different alignment matrix to the one displayed in fig 3.6. Iterating through the score and pointer data returns the following matrices: >perl Blast_example3-2.pl COELACANTH PELICAN 1: 0 0 0 0 0 0 0 0 0 0 2: 0 0 1 0 0 0 0 0 0 0 3: 0 0 0 2 1 0 0 0 0 0 4: 0 0 0 1 1 0 0 0 0 0 5: 1 0 0 0 0 2 1 0 0 0 6: 0 0 0 0 1 1 3 2 1 0 7: 0 0 0 0 0 0 2 4 3 2 End Of Score Matrix.. 1:0 0 0 0 0 0 0 0 0 0 2:0 0 \ 0 0 0 0 0 0 0 3:0 0 0 \ - 0 0 0 0 0 4:0 0 0 | \ 0 0 0 0 0 5:\ 0 0 0 0 \ - 0 0 0 6:0 0 0 0 \ | \ - - 0 7:0 0 0 0 0 0 | \ - - End Of Direction Matrix.. # The result is still the same ELACAN ELICAN (50) second paragraph`last sentence; O(n2) should be O(n^2) or "order n-squared" [56] top; equations for H are missing 1/p {58} equation 4-3; I think there should be parentheses around p_i*p_j so that the equation reads log(q_ij/(p_i*p_j)) Without them the equation as it stands is technically log((q_ij/p_i)*p_j) {59} Fig. 4-2; Matrix is not 20 x 20. Valine is left out of rows. No way to finid out value of V to V. (59) Figure 4-2. Blosum62 scoring matrix; Missing last column coresponding to V aminoacid. {61} Equation 4-4; The Sum Sum_{i=1..n}Sum_{j=1..i} q_ij is not the sum of ALL frequencies it is only the sum of the lower diagonal of the q_ij. For example if n=4, then q_12 is not a member of the sum in the formula. {62} Equations 4-7, 4-8, 4-9; The j index should go from 1 to n, not from 1 to i. You want to sum ALL scores, not just the lower diagonal ones. The example 4-1 in page 63 confirms this. If you only sum the lower diagonal then $expected_socre = $match * 0.25 + $mismatch * 0.75 / 2. AUTHOR: The equations are correct as they stand. The reason is that in every case the value of xij is the same as xji. If you take both halves of the diagonal then you are going to be off by a factor of two. [62] Equation 4-9; According to Henikoff, S. and Henikoff J.G. 1992(http://www.pnas.org/cgi/reprint/89/22/10915.pdf), "H" should correspond to a measure of mutual information or relative entropy. It appears to me that there should not be a minus sign in front of right side summation. A relevant question is that "H" in this text may not necessarily be positive. In exceptional cases where there is no dependence between two matrices, "H" can be zero. I think that relative entropy may not be the most suitable term to describe this equation, albeit being used in so many references about scoring sequence alignments. Relative entropy should be considered as a measure of the difference between two probability distributions, where mutual information, as expressed in the form of Equation 4-9, can be intuitively taken as a measure of dependence between two random variables. Scoring sequence alignments appears to correspond to the latter situation. {103} Figure 7-2; I have a question about the Sum score formula. The summation runs from i = 1 to r. However, the individual S_i are not used in the summation. Only the constant S_r is used. i is not referenced anywhere in the summation. Is this correct? {112} Code listing at bottom of page, 3rd paragraph; In the Perl Code listing, variable $n may be incorrectly commented: '#actual length of query' I think $n should be: '# actual number of letters in database' (170) 2nd paragraph, right below "Command-Line Tutorial"; The last part of the URL provided for the examples has "BLAST" in all upper case. Turns out the URL is case sensitive, and this should be all lower case, i.e., "9780596002992" to avoid the "404 Not Found" error. (173) 10.3.1.5. 9780596002992x; 9780596002992all -p blastx -d globins -i fugu_genomic > ncbi-blastx_test should be: 9780596002992all -p blastx -d globins -i fugu_globin > ncbi-blastx_test [310] line starting '$expect = "1$expect' ...; this line should be change FROM: $expect = "1$expect" if $expect = ~/^e/; ... TO: $expect = "1$expect" if $expect =~ /^e/; the printed version will not correctly use your desired minimum expect value to filter the table entries.