Errata

Beginning Perl for Bioinformatics

Errata for Beginning Perl for Bioinformatics

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
Printed Page 1
1

Can't download the examples and answers, Bro!

Anonymous   
Printed Page 55
Exercise 4.5

It seems to me that there is misunderstainig of how transcription occurs (DNA to RNA
or RNA to DNA). Will appreciate your feed-back.

Thanks
Hemant

# Perl: Exercise 4-5 Reverse Transcribing RNA into DNA

# The RNA
$RNA = 'ACGGGAGGACGGGAAAAUUACUACGGCAUUAGC';

print "
$RNA
";

# Transcribe RNA to DNA - Replace 'U' where there is 'T'.
# However, transcription occurs A -> T, U -> A, C -> G, G -> C
# So the correct answer is as below given the above RNA structure.

$DNA = $RNA;
$DNA =~ tr/ACGU/TGCA/;
print "
$DNA
";

# The correct DNA seq is TGCCCTCCTGCCCTTTTAATGATGCCGTAATCG

# The code below is incorrect

$DNA = $RNA;
$DNA =~ s/U/T/g;
print "
$DNA
";

# result is ACGGGAGGACGGGAAAATTACTACGGCATTAGC

exit;

Anonymous   
Printed Page 85
Exercise 5.6

I believe the previous reader may have been confused by the use of @ARGV in the provided solution, which is not introduced until page 98. Also, it is possible that submitting the DNA strings in lowercase format could have led to the problem, since the program will only work for uppercase sequences.

Anonymous  May 31, 2011 
Printed Page 107
for each loop in the code

When I ran example 6-4.pl after fixing the two bugs described in the text, the program
still did not generate the correct output. It seems that the variable
$receivingcommittment was never set to 1. It turned out that the variable name was
misspelt as "$recieving..." whereas it should be "$receiving...". Further correction
of the variable name would fix the problem.

Anonymous   
Printed Page 117
exercise 6.5

there is no argument passed when the subroutine is called, therefore the printout
statement is always executed, even if the file doesn't exist
It should be :

if(file_passes_tests($file)) {
print "File $file exists, is a regular file, and is nonzero in size
";
}

Anonymous   
Printed Page 132
first sub-routine (second line)

As the list of nucleotides (A/C/G/T)is specifically stated in the sub-routine
'randomnucleotide' (on Page 133) it seems supefluous to also specifically name them
in this sub-routine ('mutate') and to pass them to the second sub-routine as a
parameter which isn't used.

Anonymous   
Printed Page 143
last paragraph

Hi, When I run the subroutine, the error message show that: syntax error at c7_s4.pl
line 76, near ")
{"
Global symbol "$count" requires explicit package name at c7_s4.pl line 78.
Global symbol "$length" requires explicit package name at c7_s4.pl line 78.
syntax error at c7_s4.pl line 79, near "}"
Execution of c7_s4.pl aborted due to compilation errors.

#####################################

sub match_percentage {
my ($string1,$string2) =@_;

#assume the two strings with same length

my $length=length($string1);
my ($position);
my ($count) =0;

for ($position=0; $position < $length; ++$position) {

if(substr($string1, $position, 1) eq (substr($string2, $position, 1))
{++$count;}
}
return $count/$length;
}

Anonymous   
Printed Page 146
3rd paragraph

The output of example 7-4 contains "matching positions is 0.24%" and the accompaning
text says "a quarter of the positions match". This would be try if it said 24% or
0.24. 0.24% is a quarter of a percent, not 25 percent. Something is wrong here.

Anonymous   
Other Digital Version 148
exercise 7.5

In the answer of the exercise07.05, the subroutine mutate_codon says:


sub mutate_codon {

my($codon) = @_;

my @bases = qw(A C G T);

my $position = int rand 3;

my $base = $bases[$position];

my $newbase;

do {
$newbase = $bases[rand @bases];
} until ($newbase ne $base);

substr($codon, $position, 1) = $newbase;

return $codon;
}

which is not correct. If the author ran this exercise several times, he would realized that sometimes the result printed says

AAC mutates to AAC

The error comes in the line saying:

my $base = $bases[$position]; where the author uses $position to select the corresponding position in the codon... but he is using the array with the baes instead.

The correct subroutine should be:

sub mutate_codon {

my($codon) = @_;

my @bases = qw(A C G T);

my $position = int rand 3;

my $base = substr($codon,$position,1);

my $newbase;

do {
$newbase = $bases[rand @bases];
} until ($newbase ne $base);

substr($codon, $position, 1) = $newbase;

return $codon;
}





Juan  Jan 02, 2012 
Printed Page 185
2nd paragraph

it will look for restriction enzymes .... the restriction enzymes appear.
->
it will look for restriction sites .... the restriction sites appear.

Anonymous   
Printed Page 191
Example 9.2

In the (errata) correction of this example (changing from a foreach loop over an
array which has been read in, to a while look which reads in the array - so the range
statement will work) use is made of the
open_file() subroutine.

I didn't remember seeing this subroutine, and it isn't mentioned in the Index (either
under its name, or under subroutines). It is on page 218.

The location should be mentioned both where it is used, and in the Index.

Anonymous   
Printed Page 198
Exercise 9.6

On Line 95 of origianl answer:

for ( my $i = 1, my $j = shift(@locations) ; @locations ; $i = $j, $j =
shift(@locations) ) {
push(@digest, substr($dna, $i-1, $j-$i));
}

using this for loop, it will miss the last restriction digest because after getting
the last ensyme site, @locations will be empty, then the loop will stop.

The right for loop should like this:

for ( my $i = 1, my $j = shift(@locations), my $k = 0; $k <= scalar(@locations)+2 ;
$i = $j, $j = shift(@locations) )
{
$k++;
push(@digest, substr($dna, $i-1, $j-$i));
}

Anonymous   
Printed Page 203
3

ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt
is given as the location for finding gbrel.txt which is the Genebank release notes,
is not correct (or at least not working at the moment)
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
does work.

Anonymous   
Printed Page 211
near bottom of page

the following code in Example 10-2
($annotation, $dna) = ($record =~ /^(LOCUS.*ORIGINs*
)(.*)//
/s);

generates an error (uninitialized value <GBFILE> chunk 1) on my mac, using MacPerl

Anonymous   
Printed Page 219
sub get_annotation_and_dna

The final statement

return ($annotation, $dna) needs a ';'

Anonymous   
Printed Page 221
6

Using a hash for annotations is a great idea except in cases where an annotation type
occurs more than once in a Genbank record. I have seen many cases of Genbank records
with multiple REFERENCE annotations. I was hoping that the author would point this
out and have another example showing a hash whose values were arrays of strings.

Anonymous   
Printed Page 221
example 10.5

i have spent extraordinary effort trying to parse the elements of the Features of
Genbank files ... a proper answer to Exercise 10.5 would have been wonderfully
helpful ... it's disingenuous to fail to provide an answer and to say that "it makes
a good class project" when this book should be designed for individuals who have no
teacher; and to state that it is "straighforward but challenging" is a contradiction
in terms ... in fact, it is exactly what i want to be able to do, and have not yet
succeeded with after a great deal of effort

# Answer to Exercise 10.5
#
# The answer to this exercise is left to the student, as it makes a good class
project. It is a straightforward but challenging extension of material already
presented in the text; it also can be the basis of interesting and biologically
focused projects.
#
# Good luck with it!

Anonymous   
Printed Page 222
bottom

This code:

while ( $annotation =~ /^[A-Z].*
(^s.*
)*/gm)

generates a segmentation fault, when the code runs on
any real genbank file, such as hs_ref_chr22.gbs or
hs_ref_chr22.gbk

Anonymous   
Printed Page 223
1

Example 10-6, Parsing GenBank Annotation, which begins on page 221, produces
incorrect results on pages 223 and 224. In particular, the parse_annotation()
subroutine does not check to see if the 'field' ($key:$value) it is about to store in
the hash table has already been stored. As a result, previous occurrences of a
particular field are clobbered and only the last occurrence is recorded. In the
example given, with the input taken from page 201, only the second "REFERENCE" field
is displayed (page 224).

Interestingly, the very next section on parsing the "FEATURES" table warns on page
228 about the possibility of running into this scenario when parsing the FEATURES
table's multiple fields - some of which have the same name. The same coding solution
should have been applied to the entire GenBank record.

Anonymous   
Printed Page 241
last paragraph

.
..
3c
44
pdb1a4o.ent

->

.
..
3c
44
c1
c4
pdb1a4o.ent

Also, you have to make this change on p.243 244 246 247

Anonymous   
Printed Page 288
code at bottom of page

As noted in another "confirmed" error report, there is an error in the code found at
the bottom of page 288. However, I believe the solution is still in error.

In particular, while the proposed solution (adding parentheses to the regular
expressions; e.g. changing /^Query.*
/ to /^Query(.*)
/ and /^Sbjct.*
/ to
/^Sbjct(.*)
/) may correct an error (I have not tested the code, so I do not know if
there are other errors), I do not think it will fix the error of the extraneous "ct"
being prepended to the "Subject String" lines in the output at the top of page 289.
That error, I believe, is caused by another faulty regular expression at the very end
of the code; in particular, the line: $subject =~ s/[^acgt]//g;. As you can see,
this line will NOT remove c's and t's from the long, concatenated "Sbjct:" line
created from the HSP hash table. Hence, the multiple occurrences of "ct" in the
output.

Anonymous