Chapter 2. Data Structures and String Algorithms

So far in this book, I’ve used the standard Perl data structures of scalars, arrays, and hashes. However, it is often necessary to handle data with a more complex structure than what those basics allow. For instance, it is frequently useful to have a two-dimensional array.

In this chapter, you’ll learn how to define and use references and complex data structures. After you learn the fundamentals, you’ll apply the new techniques to implement a biologically important algorithm. These techniques are also fundamental to the implementation of object-oriented programming, as you’ll see in Chapter 3.

The algorithm we’ll study is called approximate string matching. It lets you find the closest match for a peptide fragment in a protein, for instance. It uses an algorithmic technique called dynamic programming, an essential tool for many similar biological tasks, such as aligning biological sequences. In this chapter, you’ll see how Perl references can be used to write programs for data problems with more complex relationships. References are also used for the objects of object-oriented programming.

Basic Perl Data Types

Before tackling references, let’s review the basic Perl data types:

Scalar

A scalar value is a string or any one of several kinds of numbers such as integers, floating-point (decimal) numbers, or numbers in scientific notation such as 2.3E23. A scalar variable begins with the dollar sign $, as in $dna.

Array

An array is an ordered ...

Get Mastering Perl for Bioinformatics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.