By James Tisdall
Price: $39.95 USD
£28.50 GBP
Cover | Table of Contents | Colophon
GeneticCode.pm. This example shows how to
create simple modules, and I'll give examples of
programs that use this module.my and
use
strict. They also serve as
the basic mechanism for defining object-oriented classes.my and
use
strict. They also serve as
the basic mechanism for defining object-oriented classes.package declaration described in the next
section is one way to assign separate namespaces to different parts
of your code. It gives strong protection against accidentally using a
variable name that's used in another part of the
program and having the two identically-named variables interact in
unwanted ways.my
to
restrict the scope of a variable to its enclosing block (between
matching curly braces
{}) and should be accustomed to
using the directive use
strict
to require the use of my for all variables.
use strict
and my are a great
way to protect your program from unintentional reuse of variable
names. Make a habit of using my and working under
use
strict.package
declaration puts a new namespace in
effect. Here's a simple example:$dna = 'AAAAAAAAAA'; package Mouse; $dna = 'CCCCCCCCCC'; package Celegans; $dna = 'GGGGGGGGGG';
$dna. However, they are in three different
packages, so they appear in three different symbol tables and are
managed separately by the running Perl program.$dna. Because no package is
explicitly named, this $dna variable appears in
the default namespace main.package
Mouse;. At this point, the main
namespace is no longer active, and the Mouse
namespace is brought into play. Note that the name of the namespace
is capitalized; it's a well-established convention
you should follow. The only
noncapitalized
namespace you should use is the default main.Mouse namespace is in effect, the
third line of code, which declares a variable,
$dna, is actually declaring a separate variable
unrelated to the first. It contains a poly-C fragment of DNA.Celegans and a new variable, also called
$dna, that stores a poly-G DNA fragment.$dna variables, you need to
explicitly state which packages you want the variables from, as the
following code fragment demonstrates:print "The DNA from the main package:\n\n"; print $main::dna, "\n\n"; print "The DNA from the Mouse package:\n\n"; print $Mouse::dna, "\n\n"; print "The DNA from the Celegans package:\n\n"; print $Celegans::dna, "\n\n";
Newmodule.pm. Now, edit the file and give it
a new first line:package Newmodule;
1;. You've
now created a Perl module.Celegans module, place subroutines in a
file called Celegans.pm, and add a first line:package Celegans;
1;, and you've
defined a Celegans module. This last line just
ensures that the library returns a true value when
it's read in. It's annoying, but
necessary..pm module files on your computer
affects the name of the module, so let's take a
moment to sort out the most important points. For all the details,
consult the
perlmod
and the
perlmodlib
parts of the Perl documentation at http://www.perldoc.org. You can also type
perldoc
perlmod or
perldoc
perlmodlib at a shell
prompt or in a command window.Celegans.pm is loaded from another program:use Celegans;
@INC
, like so:print join("\n", @INC), "\n";
/usr/local/lib/perl5/5.8.0/i686-linux /usr/local/lib/perl5/5.8.0 /usr/local/lib/perl5/site_perl/5.8.0/i686-linux /usr/local/lib/perl5/site_perl/5.8.0 /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl .
Geneticcode.pm
,
which implements the genetic code that maps DNA codons to amino acids
and then translates a string of DNA sequence data to a protein
fragment.Geneticcode.pm and using it to define the mapping
of codons to amino acids in a hash variable called
%genetic_code. We'll also discuss
a subroutine called codon2aa that uses the hash to
translate its codon arguments into amino acid return values.Geneticcode.pm:package Geneticcode;
use strict;
use warnings;
my(%genetic_code) = (
'TCA' => 'S', # Serine
'TCC' => 'S', # Serine
'TCG' => 'S', # Serine
'TCT' => 'S', # Serine
'TTC' => 'F', # Phenylalanine
'TTT' => 'F', # Phenylalanine
'TTA' => 'L', # Leucine
'TTG' => 'L', # Leucine
'TAC' => 'Y', # Tyrosine
'TAT' => 'Y', # Tyrosine
'TAA' => '_', # Stop
'TAG' => '_', # Stop
'TGC' => 'C', # Cysteine
'TGT' => 'C', # Cysteine
'TGA' => '_', # Stop
'TGG' => 'W', # Tryptophan
'CTA' => 'L', # Leucine
'CTC' => 'L', # Leucine
'CTG' => 'L', # Leucine
'CTT' => 'L', # Leucine
'CCA' => 'P', # Proline
'CCC' => 'P', # Proline
'CCG' => 'P', # Proline
'CCT' => 'P', # Proline
'CAC' => 'H', # Histidine
'CAT' => 'H', # Histidine
'CAA' => 'Q', # Glutamine
'CAG' => 'Q', # Glutamine
'CGA' => 'R', # Arginine
'CGC' => 'R', # Arginine
'CGG' => 'R', # Arginine
'CGT' => 'R', # Arginine
'ATA' => 'I', # Isoleucine
'ATC' => 'I', # Isoleucine
'ATT' => 'I', # Isoleucine
'ATG' => 'M', # Methionine
'ACA' => 'T', # Threonine
'ACC' => 'T', # Threonine
'ACG' => 'T', # Threonine
'ACT' => 'T', # Threonine
'AAC' => 'N', # Asparagine
'AAT' => 'N', # Asparagine
'AAA' => 'K', # Lysine
'AAG' => 'K', # Lysine
'AGC' => 'S', # Serine
'AGT' => 'S', # Serine
'AGA' => 'R', # Arginine
'AGG' => 'R', # Arginine
'GTA' => 'V', # Valine
'GTC' => 'V', # Valine
'GTG' => 'V', # Valine
'GTT' => 'V', # Valine
'GCA' => 'A', # Alanine
'GCC' => 'A', # Alanine
'GCG' => 'A', # Alanine
'GCT' => 'A', # Alanine
'GAC' => 'D', # Aspartic Acid
'GAT' => 'D', # Aspartic Acid
'GAA' => 'E', # Glutamic Acid
'GAG' => 'E', # Glutamic Acid
'GGA' => 'G', # Glycine
'GGC' => 'G', # Glycine
'GGG' => 'G', # Glycine
'GGT' => 'G', # Glycine
);
#
# codon2aa
#
# A subroutine to translate a DNA 3-character codon to an amino acid
# Version 3, using hash lookup
sub codon2aa {
my($codon) = @_;
$codon = uc $codon;
if(exists $genetic_code{$codon}) {
return $genetic_code{$codon};
}else{
die "Bad codon '$codon'!!\n";
}
}
1;package declarations), since the main result seems
to be the necessity to refer to subroutines in the modules with
longer names!Exporter module in the module code
and modify the use MODULE declaration in the
calling code.Geneticcode.pm
module, recall it began with this line:package Geneticcode;
genetic_code and the subroutine
codon2aa.codon2aa instead of
Geneticcode::codon2aa). Here's a
short example of how it works (try typing perldoc
Exporter to see the whole story):package Geneticcode; require Exporter; @ISA = qw(Exporter); @EXPORT_OK = qw(...); # symbols to export on request
codon2aa from the module only when explicitly
requested:@EXPORT_OK = qw(codon2aa); # symbols to export on request
codon2aa symbol like so:use Geneticcode qw(codon2aa);
codon2aa($codon);
Geneticcode::codon2aa($codon);
Exporter module that's
included in the standard Perl distribution has several other optional
behaviors, but the way just shown is the safest and most useful. As
you'll see, the object-oriented programming style of
using modules doesn't use the
http://www.cpan.org) is an
impressively large collection of Perl code (mostly Perl modules).
CPAN is easily accessible and searchable on the Web, and you can use
its modules for a variety of programming tasks.CPAN.pm module built-in
with Perl that makes downloading and installing modules quite easy
(when things work well, which they usually do). If you
can't find CPAN.pm, you should
consider updating your current version.perldoc CPAN
Development Support Operating System Interfaces Networking Devices IPC Data Type Utilities Database Interfaces User Interfaces Language Interfaces File Names Systems Locking String Lang Text Proc Opt Arg Param Proc Internationalization Locale Security and Encryption World Wide Web HTML HTTP CGI Server and Daemon Utilities Archiving and Compression Images Pixmaps Bitmaps Mail and Usenet News Control Flow Utilities File Handle Input Output Microsoft Windows Modules Miscellaneous Modules Commercial Software Interfaces Not In Modulelist
testGeneticcode contains the following
loop:# Translate each three-base codon to an amino acid, and append to a protein
for(my $i=0; $i < (length($dna) - 2) ; $i += 3) {
$protein .= Geneticcode::codon2aa( substr($dna,$i,3) );
}
# Translate each three-base codon to an amino acid, and append to a protein
my $i=0;
while (my $codon = substr($dna, $i += 3, 3) ) {
$protein .= Geneticcode::codon2aa( $codon );
}
$, as in
$dna.@
, as in @peptides. An
array can be initialized by a list such as
@peptides
=
('zeroth', 'first',
'second'). Individual scalar elements of an array
are referred to by first preceding the array name with a dollar sign
(an individual element of an array is a scalar value) and then
following the array name with the position of the desired element in
square brackets. Thus the first element of the
$, as in
$dna.@
, as in @peptides. An
array can be initialized by a list such as
@peptides
=
('zeroth', 'first',
'second'). Individual scalar elements of an array
are referred to by first preceding the array name with a dollar sign
(an individual element of an array is a scalar value) and then
following the array name with the position of the desired element in
square brackets. Thus the first element of the
@peptides array is referenced by
$peptides[0] and has the value
'zeroth'. (Note that array elements are given the
positions 0, 1, 2, ..., n-1, where
n is the number of elements in the array.)@pentamers = ('cggca', 'tgatc', 'ttggc');
print "@pentamers", "\n";
print @pentamers, "\n";
cggca tgatc ttggc cggcatgatcttggc
$peptide = 'EIQADEVRL';
$peptideref = \$peptide;
print "Here is what's in the reference:\n";
print $peptideref, "\n";
print "Here is what the reference is pointing to:\n";
print ${$peptideref}, "\n";
print $$peptideref, "\n";
Here is what's in the reference: SCALAR(0x80fe4ac) Here is what the reference is pointing to: EIQADEVRL EIQADEVRL
EIQADEVRL is assigned to
the scalar variable $peptide. Next, a backslash
operator is used before the $peptide variable to
return a reference to the variable. This reference is saved in the
scalar variable $peptideref.$peptideref, you get the value:SCALAR(0x80fe4ac)
$peptideref
is pointing to a scalar value (which is the value of the scalar
variable $peptide). It also gives a hexadecimal
number that specifies where in the computer memory the value for that
variable resides.@probes = (
[1, 3, 2, 9],
[2, 0, 8, 1],
[5, 4, 6, 7],
[1, 9, 2, 8]
);
print "The probe at row 1, column 2 has value ", $probes[1][2], "\n";
The probe at row 1, column 2 has value 8
# Declare reference to (empty) anonymous array
$array = [ ];
# Initialize the array
for($i=0; $i < 4 ; ++$i) {
for($j=0; $j < 4 ; ++$j) {
$array->[$i][$j] = $i * $j;
}
}
# Reset one of the elements of the array
$array->[3][2] = 99;
# Print the array
for($i=0; $i < 4 ; ++$i) {
for($j=0; $j < 4 ; ++$j) {
printf("%3d ", $array->[$i][$j]);
}
print "\n";
}use Data::Dumper;
%relatedgenes = ( );
$relatedgenes{'stromelysin'} = [
'C.elegans',
'Arabidopsis thaliana'
];
$relatedgenes{'obesity'} = [
'Drosophila',
'Mus musculus'
];
# Now add a new related organism to the entry for 'stromelysin'
push( @{$relatedgenes{'stromelysin'}}, 'Canis' );
print Dumper(\%relatedgenes);
Data::Dumper module is described in more detail
later; try typing perldoc
Data::Dumper for the details of this useful way to
print out complex data structures):$VAR1 = {
'stromelysin' => [
'C.elegans',
'Arabidopsis thaliana',
'Canis'
],
'obesity' => [
'Drosophila',
'Mus musculus'
]
};Data::Dumper
module. This module comes standard with
all recent versions of Perl.perldoc
Data::Dumper command:NAME
Data::Dumper - stringified perl data structures, suitable
for both printing and "eval"
SYNOPSIS
use Data::Dumper;
# simple procedural interface
print Dumper($foo, $bar);
(...)
DESCRIPTION
Given a list of scalars or reference variables, writes out
their contents in perl syntax. The references can also be
objects. The contents of each variable is output in a
single Perl statement. Handles self-referential strucTures correctly.
The return value can be "eval"ed to get back an identical
copy of the original reference structure.
(...)
use Data::Dumper;
$array = [ ];
# Initialize the array
for($i=0; $i < 4 ; ++$i) {
for($j=0; $j < 4 ; ++$j) {
$array->[$i][$j] = $i * $j;
}
}
# Print the array "by hand"
for($i=0; $i < 4 ; ++$i) {
for($j=0; $j < 4 ; ++$j) {
printf("%3d ", $array->[$i][$j]);
}
print "\n";
}
# Print the array using Data::Dumper
print Dumper($array);
0 0 0 0
0 1 2 3
0 2 4 6
0 3 6 9
$VAR1 = [
[
0,
0,
0,
0
],
[
0,
1,
2,
3
],
[
0,
2,
4,
6
],
[
0,
3,
6,
9
]
];
portend and profound. You can
apply the following edits to portend:portend
(delete o)
prtend
(insert o)
protend
(change t to f)
profend
(change e to o)
profond
(insert u)
profound
perlreftut tutorial page from the Perl
documentation gives a short introduction to Perl references (type
perldoc
perlreftut at your
command line if Perl is installed, or visit the web page http://www.perldoc.com).perlref tutorial page from the Perl
documentation discusses Perl references in detail.perldata tutorial page from the Perl
documentation gives an introduction to Perl data structures.perldsc tutorial page from the Perl
documentation presents a "cookbook"
overview of Perl data structures.perllol tutorial page from the Perl
documentation gives an introduction to arrays of arrays.$$arr[0] the same as
$arr->[0]? Why or why not?min subroutine that returns the minimum of
two integers. Rewrite min3 using it.