Developing Bioinformatics Computer Skills by Cynthia Gibas and Per Jambeck Unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. This page was updated October 21, 2002 Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification UNCONFIRMED errors and comments from readers: (1) Chapter 1; I read the example chapter online, it's Chapter 1 but there are no page numbers. Here's the phrase: "With a commonly used computer program called fsBLAST". There's no such thing as a program called fsBLAST, it's BLAST. (7) middle line between two sequence alignments "Query: 24" and "Sbjct: 17": In the book's first visual display of a sequence alignment on page 7, the middle line showing the relationship between the eyeless gene and the aniridia gene is formatted two spaces too far to the left so that HSGVNQLGGVFV GRPLPDSTRQKIVELAHSGARPCDISRILQVSN, which should begin at base pair 15, starts at base pair 13. This error could be rather confusing to someone who had little experience with sequence alignments and didn't pick up on the error, since in the text it explicitly says, "If there is a letter on the middle line, the sequences match exactly at that position... If there is nothing on the middle line, the two sequences don't match at that position." Yet due to the formatting error, this statement is no longer true on the first line of the alignment, though it should be. [25] last paragraph; This section is about the translation of mRNA to protein, yet this paragraph says, "...the genetic code is the code that translates DNA into protein" and "it takes three bases of DNA...". It goes on to say, "Figure 2-4 shows how RNA is translated into protein". There seems to be confusion about whether DNA or mRNA or RNA is translated into protein. {26} Figure 2-4; tRNA translation of GAU is shown as mapping to Gly, yet figure 2-3 shows GAU as mapping to Asp. This may confuse someone who is not familiar with the Genetic Code. (36) 1st paragraph, last sentence; The word "protein" begins the sentence and should be capitalized: "Protein structure alignment tools are introduced in Chapter 10." <56> JDK/JRE; The text reads: "to use Java-based tools such as the Jalview sequence editor we discuss in Chapter 4, Files and Directories in Unix." Jalview is discussed in Chapter 8 (page 196). (67) 3rd paragraph; The directory "home/jambeck/mustelidae" should be changed to "/home/jambeck/mustelidae" {74} long format of ls (code); "Mar5" should be changed to "Mar 5" (chapters 3, 4, 5) all; Recurring inconsistencies: The authors do not differentiate between a file and a filename. Pages 100, 101, 102, and 103: in the Usage parts, they use "file", "files", "filename", "filenames", and even "filename(s)" on page 109. {95} seventh paragraph; "meercat.txt" doesn't exist. It should be changed to "meercat10.txt". [98] Status line; It is a colon (:) not a semicolon (;) that is used to get to the status line in vi. The error occurs twice in the first paragraph on page 98. [99] 2nd list item; "r]" should be ":r" [101] last section; "- number" should be "-number" {102} 3rd paragraph; "num" should be "number" ?103? second half; If you use cut -f 1-2 sequence_data on the described file, you will get both fields (both columns) send to stdout, at least you will get ATC TAC (the first line) but never AAT TAC, will you? [103] command line example after 3rd paragraph; the csplit example %csplit -f dbrecord. -n 6 fastadbfile /^>/ splits the input file just in two parts, splitting at the first occurence of ">". to split the input file into many single sequence files the option "{*}" has to be added (confirmed with csplit from GNU textutils 2.0.11) ?110? options; --help and --version are usable for all the given commands. Why did the authors specifically add it here ? {112-3} last paragraph; The amino acid sequence in the seqres file starts at 14 and ends at 64, so that the command must be: cut -c14-64 seqres > seqs to get the output shown on toop of the next page. This is a little bit inconsistent and confusing, like the known error on page 7. BTW: the same numbering error occurs on page 104 in the piped command "% grep SEQRES pdbfile| cut -c...." [113] script; The temporary file is not removed, and since authors are dealing within GenBank files (several Mb at a time), they really need to remove it in the foreach loop. This is a major issue. Also, there isn't any indentation: if you want biologists to develop computer skills, you have to explain to them how to indent scripts and Perl programs. {115} first paragraph; The directory "/home/httpd/html" is too specific. It would be better explain things using httpd.conf and grep '^ServerRoot' in this file to obtain the appropriate directory. {115} Usages; inconsistencies: telnet full.hostname and ftp full.host.name.edu (115) ftp section; "The File Transfer Protocol (ftp)" should be "The File Transfer Protocol (FTP)". ftp is a program. FTP is the protocol. {116} 3rd paragraph from the bottom; % xhost + is an inappropriate command. It should be: % xhost +remote_hostname to prevent potential security problems. ?120? 3rd paragraph; Can you briefly explain this sentence: Even on a single processor system, it's possible to have multiple processes running concurrently as long as there is enough space for both jobs to remain in memory. I am not sure of what you really mean. [122] list of top options; "-d" should be "-d delay" {123} last paragraph; Since priority value is in range 1-19, writing "but unless you are root, you are limited to raising its priority to 1" doesn't make sense. (126) last paragraph; /zeus: should be zeus: (134) 5th paragraph; If you search for protein structure on Excite.... ^^^^^^^ Google defaults to AND, so you'll find only references that contain protein and structure.... ^^^^^^^ ^^^^^^^^^ The indicated words should be in italic. (142) 3rd paragraph; The first sentence reads: The standard reduced representation of the 3D structure of biomolecule consists of.... It should read: The standard reduced representation of the 3D structure of biomolecules consists of.... [154] 1st paragraph, 2nd sentence; The forward reference promising to discuss the differences between PDB and mmCIF in Chapter 12 is not kept. Checking the index, neither PDB nor mmCIF has any entries between pages 331 and 349 -- all of Chapter 12. Actually reading through the chapter confirms that neither of those formats are discussed in the course of the chapter. Similarly scanning through the pages where "mmCIF" appears (according to the index) does not give me any further details about differences between the two, other than that mmCIF is newer and the 'community is still attached' to PDB. (160) Bottom paragraph; The last sentence reads: But keep this fact in mind: the single-letter sequence code that describes DNA and is a simplified representation.... It should read: But keep this fact in mind: the single-letter sequence code that describes DNA is a simplified representation.... (216) Section: A Word About.... 1st paragraph, last sentence; The sentence promises to give an example of file-format conversion in Chapter 12. The examples in Chapter 12 deal with string/pattern searching/matching and parsing BLAST data to compile a report. If the BLAST parsing is the intended example, some other phrasing should be used instead of "file-format conversion" on page 216. Most anyone reading those words will think something similar to converting GIF to JPEG (image formats) rather than what's actually provided in Chapter 12. Or perhaps the example was omitted at printing and the reference to it here and elsewhere was not also removed? (218) Figure 9-1: N should be H on C with R2 (236) 2nd full paragraph, line 2: "forwRasMolard" should be "forward". [245] 5th paragraph - section CATH; The URL for CATH sould read http://www.biochem.ucl.ac.uk/bsm/cath_new/ (original is missing the ucl portion of the hostname) {246} last paragraph: You mention that you do not know of any software that allows a user to "create a unique data set based on your own choice of parameters". Actually, our website does allow this. There is a form where one can choose resolution, sequence identity, R-factor, length cutoffs as well as whether to include C-alpha only or NMR structures. The website address is: http://www.fccc.edu/research/labs/dunbrack/culledpdb.html (294) 2nd paragraph, 2nd to last sentence; "pheotype" should probably be "phenotype" (301) 2nd paragraph; The text at the end of paragraph 2 states: ...the fragment of a genome can be ordered into a highly specific map (see figure 11-2). Figure 11-2 (page 298) however is the detector output for modern sequencing experiment, and not show a figure of a highly specific map. Other figures in this chapter also do not fit this description, so it looks like a figure is missing from the book . ?324? 3rd paragraph: Where would one locate PATH-DB?