Chapter 8. The Genetic Code

Up to this point we’ve used Perl to search for motifs, simulate DNA mutations, generate random sequences, and transcribe DNA to RNA. These are all important activities, and they serve as a good introduction to the computational techniques you can use to study biological systems.

In this chapter, we’ll write Perl programs to simulate how the genetic code directs the translation of DNA into protein. I will start by introducing the hash datatype. Then, after a brief discussion of how different data structures (like hashes and arrays) and database systems can store and access experimental information, we will write a program to translate DNA to protein. We’ll also continue exploring regular expressions and write code to handle FASTA files.


There are three main datatypes in Perl. You’ve already seen two: scalar variables and arrays. Now we’ll start to use the third: hashes (also called associative arrays).

A hash provides very fast lookup of the value associated with a key. As an example, say you have a hash called %english_dictionary. (Yes, hashes start with the percent sign.) If you want to look up the definition of the word “recreant,” you say:

$definition = $english_dictionary{'recreant'};

The scalar 'recreant' is the key, and the scalar definition that’s returned is the value. As you see from this example, hashes (like arrays) change their leading character to a dollar sign when you access a single element, because the value returned from a hash ...

Get Beginning Perl for Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.