Chapter 4. Introduction to References

References are the basis for complex data structures, object-oriented programming (OOP), and fancy subroutine magic. They’re the magic that was added between Perl version 4 and version 5 to make it all possible.

A Perl scalar variable holds a single value. An array holds an ordered list of one or more scalars. A hash holds a collection of scalars as values, keyed by other scalars. Although a scalar can be an arbitrary string, which allows complex data to be encoded into an array or hash, none of the three data types are well suited to complex data interrelationships. This is a job for the reference. Let’s look at the importance of references by starting with an example.

Performing the Same Task on Many Arrays

Before the Minnow can leave on an excursion (for example, a three-hour tour), we should check every passenger and crew member to ensure they have all the required trip items in their possession. Let’s say that, for maritime safety, every person on-board the Minnow needs to have a life preserver, some sunscreen, a water bottle, and a rain jacket. We can write a bit of code to check for the Skipper’s supplies:

my @required = qw(preserver sunscreen water_bottle jacket);
my @skipper  = qw(blue_shirt hat jacket preserver sunscreen);

for my $item (@required) {
  unless (grep $item eq $_, @skipper) { # not found in list?
    print "skipper is missing $item.\n";
  }
}

The grep in a scalar context returns the number of times the expression $item eq $_ returns true, which is 1 if the item is in the list and 0 if not.[*] If the value is 0, it’s false, and we print the message.

Of course, if we want to check on Gilligan and the Professor, we might write the following code:

my @gilligan = qw(red_shirt hat lucky_socks water_bottle);
for my $item (@required) {
  unless (grep $item eq $_, @gilligan) { # not found in list?
    print "gilligan is missing $item.\n";
  }
}

my @professor = qw(sunscreen water_bottle slide_rule batteries radio);
for my $item (@required) {
  unless (grep $item eq $_, @professor) { # not found in list?
    print "professor is missing $item.\n";
  }
}

You may start to notice a lot of repeated code here and think that we should refactor that into a common subroutine that we can reuse (and you’d be right):

sub check_required_items {
  my $who = shift;
  my @required = qw(preserver sunscreen water_bottle jacket);
  for my $item (@required) {
    unless (grep $item eq $_, @_) { # not found in list?
      print "$who is missing $item.\n";
    }
  }
}

my @gilligan = qw(red_shirt hat lucky_socks water_bottle);
check_required_items('gilligan', @gilligan);

Perl gives the subroutine five items in its @_ array initially: the name gilligan and the four items belonging to Gilligan. After the shift, @_ only has the items. Thus, the grep checks each required item against the list.

So far, so good. We can check the Skipper and the Professor with just a bit more code:

my @skipper   = qw(blue_shirt hat jacket preserver sunscreen);
my @professor = qw(sunscreen water_bottle slide_rule batteries radio);
check_required_items('skipper', @skipper);
check_required_items('professor', @professor);

And for the other passengers, we repeat as needed. Although this code meets the initial requirements, we’ve got two problems to deal with:

  • To create @_, Perl copies the entire contents of the array to be scanned. This is fine for a few items, but if the array is large, it seems a bit wasteful to copy the data just to pass it into a subroutine.

  • Suppose we want to modify the original array to force the provisions list to include the mandatory items. Because we have a copy in the subroutine (“pass by value”), any changes we make to @_ aren’t reflected automatically in the corresponding provisions array.[*]

To solve either or both of these problems, we need pass by reference rather than pass by value. And that’s just what the doctor (or Professor) ordered.

Taking a Reference to an Array

Among its many other meanings, the backslash (\) character is also the “take a reference to” operator. When we use it in front of an array name, e.g., \@skipper, the result is a reference to that array. A reference to the array is like a pointer: it points at the array, but it is not the array itself.

A reference fits wherever a scalar fits. It can go into an element of an array or a hash, or into a plain scalar variable, like this:

my $reference_to_skipper = \@skipper;

The reference can be copied:

my $second_reference_to_skipper = $reference_to_skipper;

or even:

my $third_reference_to_skipper = \@skipper;

We can interchange all three references. We can even say they’re identical, because, in fact, they are the same thing.

if ($reference_to_skipper =  = $second_reference_to_skipper) {
  print "They are identical references.\n";
}

This equality compares the numeric forms of the two references. The numeric form of the reference is the unique memory address of the @skipper internal data structure, unchanging during the life of the variable. If we look at the string form instead, with eq or print, we get a debugging string:

ARRAY(0x1a2b3c)

which again is unique for this array because it includes the hexadecimal (base 16) representation of the array’s unique memory address. The debugging string also notes that this is an array reference. Of course, if we ever see something like this in our output, it almost certainly means we have a bug; users of our program have little interest in hex dumps of storage addresses!

Because we can copy a reference, and passing an argument to a subroutine is really just copying, we can use this code to pass a reference to the array into the subroutine:

my @skipper = qw(blue_shirt hat jacket preserver sunscreen);
check_required_items("The Skipper", \@skipper);

sub check_required_items {
  my $who = shift;
  my $items = shift;
  my @required = qw(preserver sunscreen water_bottle jacket);
  ...
}

Now $items in the subroutine is a reference to the array of @skipper. But how do we get from a reference back into the original array? We dereference the reference, of course.

Dereferencing the Array Reference

If you look at @skipper, you’ll see that it consists of two parts: the @ symbol and the name of the array. Similarly, the syntax $skipper[1] consists of the name of the array in the middle and some syntax around the outside to get at the second element of the array (index value 1 is the second element because index values start at 0).

Here’s the trick: we can place any reference to an array in curly braces in place of the name of an array, ending up with a method to access the original array. That is, wherever we write skipper to name the array, we use the reference inside curly braces: { $items }. For example, both of these lines refer to the entire array:

@  skipper
@{ $items }

whereas both of these refer to the second item of the array:[*]

$  skipper [1]
${ $items }[1]

By using the reference form, we’ve decoupled the code and the method of array access from the actual array. Let’s see how that changes the rest of this subroutine:

sub check_required_items {
  my $who   = shift;
  my $items = shift;

  my @required = qw(preserver sunscreen water_bottle jacket);
  for my $item (@required) {
    unless (grep $item eq $_, @{$items}) { # not found in list?
      print "$who is missing $item.\n";
    }
  }
}

All we did was replace @_ (the copy of the provisions list) with @{$items}, a dereferencing of the reference to the original provisions array. Now we can call the subroutine a few times, as before:

my @skipper = qw(blue_shirt hat jacket preserver sunscreen);
check_required_items('The Skipper', \@skipper);

my @professor = qw(sunscreen water_bottle slide_rule batteries radio);
check_required_items('Professor', \@professor);

my @gilligan = qw(red_shirt hat lucky_socks water_bottle);
check_required_items('Gilligan', \@gilligan);

In each case, $items points to a different array, so the same code applies to different arrays each time we invoke it. This is one of the most important uses of references: decoupling the code from the data structure on which it operates so we can reuse the code more readily.

Passing the array by reference fixes the first of the two problems we mentioned earlier. Now, instead of copying the entire provision list into the @_ array, we get a single element of a reference to that provisions array.

Could we have eliminated the two shifts at the beginning of the subroutine? Sure, but we sacrifice clarity:

sub check_required_items {
  my @required = qw(preserver sunscreen water_bottle jacket);
  for my $item (@required) {
    unless (grep $item eq $_, @{$_[1]}) { # not found in list?
      print "$_[0] is missing $item.\n";
    }
  }
}

We still have two elements in @_. The first element is the passenger or crew member name, which we use in the error message. The second element is a reference to the correct provisions array, which we use in the grep expression.

Getting Our Braces Off

Most of the time, the array reference we want to dereference is a simple scalar variable, such as @{$items} or ${$items}[1]. In those cases, we can drop the curly braces, unambiguously forming @$items or $$items[1].

However, we cannot drop the braces if the value within the braces is not a simple scalar variable. For example, for @{$_[1]} from that last subroutine rewrite, we can’t remove the braces. That’s a single element access to an array, not a scalar variable.

This rule also means that it’s easy to see where the “missing” braces need to go. When we see $$items[1], a pretty noisy piece of syntax, we can tell that the curly braces must belong around the simple scalar variable, $items. Therefore, $items must be a reference to an array.

Thus, an easier-on-the-eyes version of that subroutine might be:

sub check_required_items {
  my $who   = shift;
  my $items = shift;

  my @required = qw(preserver sunscreen water_bottle jacket);
  for my $item (@required) {
    unless (grep $item eq $_, @$items) { # not found in list?
      print "$who is missing $item.\n";
    }
  }
}

The only difference here is that we removed the braces around @$items.

Modifying the Array

You’ve seen how to solve the excessive copying problem with an array reference. Now let’s look at modifying the original array.

For every missing provision, we push that provision onto an array, forcing the passenger to consider the item:

sub check_required_items {
  my $who   = shift;
  my $items = shift;

  my @required = qw(preserver sunscreen water_bottle jacket);
  my @missing = (  );

  for my $item (@required) {
    unless (grep $item eq $_, @$items) { # not found in list?
      print "$who is missing $item.\n";
      push @missing, $item;
    }
  }

  if (@missing) {
    print "Adding @missing to @$items for $who.\n";
    push @$items, @missing;
  }
}

Note the addition of the @missing array. If we find any items missing during the scan, we push them into @missing. If there’s anything there at the end of the scan, we add it to the original provision list.

The key is in the last line of that subroutine. We’re dereferencing the $items array reference, accessing the original array, and adding the elements from @missing. Without passing by reference, we’d modify only a local copy of the data, which has no effect on the original array.

Also, @$items (and its more generic form, @{$items}) works within a double-quoted string. We can’t include any whitespace between the @ and the immediately following character, although we can include nearly arbitrary whitespace within the curly braces as if it were normal Perl code.

Nested Data Structures

In this example, the array @_ contains two elements, one of which is also an array. What if we take a reference to an array that also contains a reference to an array? We end up with a complex data structure, which can be quite useful.

For example, we can iterate over the data for the Skipper, Gilligan, and the Professor by first building a larger data structure holding the entire list of provision lists:

my @skipper = qw(blue_shirt hat jacket preserver sunscreen);
my @skipper_with_name = ('Skipper', \@skipper);
my @professor = qw(sunscreen water_bottle slide_rule batteries radio);
my @professor_with_name = ('Professor', \@professor);
my @gilligan = qw(red_shirt hat lucky_socks water_bottle);
my @gilligan_with_name = ('Gilligan', \@gilligan);

At this point, @skipper_with_name has two elements, the second of which is an array reference similar to what we passed to the subroutine. Now we group them all:

my @all_with_names = (
  \@skipper_with_name,
  \@professor_with_name,
  \@gilligan_with_name,
);

Note that we have just three elements, each of which is a reference to an array that has two elements: the name and its corresponding initial provisions. A picture of that is in Figure 4-1.

The array @all_with_names holds a multilevel data structure containing strings and references to arrays
Figure 4-1. The array @all_with_names holds a multilevel data structure containing strings and references to arrays

Therefore, $all_with_names[2] will be the array reference for the Gilligan’s data. If you dereference it as @{$all_with_names[2]}, you get a two-element array, "Gilligan" and another array reference.

How do we access that array reference? Using our rules again, it’s ${$all_with_names[2]}[1]. In other words, taking $all_with_names[2], we dereference it in an expression that would be something like $DUMMY[1] as an ordinary array, so we’ll place {$all_with_names[2]} in place of DUMMY.

How do we call the existing check_required_items( ) with this data structure? The following code is easy enough.

for my $person (@all_with_names) {
  my $who = $$person[0];
  my $provisions_reference = $$person[1];
  check_required_items($who, $provisions_reference);
}

This requires no changes to the subroutine. The control variable $person will be each of $all_with_names[0], $all_with_names[1], and $all_with_names[2], as the loop progresses. When we dereference $$person[0], we get “Skipper,” “Professor,” and “Gilligan,” respectively. $$person[1] is the corresponding array reference of provisions for that person.

Of course, we can shorten this as well, since the entire dereferenced array matches the argument list precisely:

for my $person (@all_with_names) {
  check_required_items(@$person);
}

or even:

check_required_items(@$_) for @all_with_names;

As you can see, various levels of optimization can lead to obfuscation. Be sure to consider where your head will be a month from now when you have to reread your own code. If that’s not enough, consider the new person who will take over your job after you have left.[*]

Simplifying Nested Element References with Arrows

Look at the curly-brace dereferencing again. As in our earlier example, the array reference for Gilligan’s provision list is ${$all_with_names[2]}[1]. Now, what if we want to know Gilligan’s first provision? We need to dereference this item one more level, so it’s yet another layer of braces: ${${$all_with_names[2]}[1]}[0]. That’s a really noisy piece of syntax. Can we shorten that? Yes!

Everywhere we write ${DUMMY}[$y], we can write DUMMY->[$y] instead. In other words, we can dereference an array reference, picking out a particular element of that array by simply following the expression defining the array reference with an arrow and a square-bracketed subscript.

For this example, this means we can pick out the array reference for Gilligan with a simple $all_with_names[2]->[1], and Gilligan’s first provision with $all_with_names[2]->[1]->[0]. Wow, that’s definitely easier on the eyes.

If that weren’t already simple enough, there’s one more rule: if the arrow ends up between “subscripty kinds of things,” such as square brackets, we can also drop the arrow. $all_with_names[2]->[1]->[0] becomes $all_with_names[2][1][0]. Now it’s looking even easier on the eyes.

The arrow has to be between non-subscripty things. Why wouldn’t it be between subscripty things? Well, imagine a reference to the array @all_with_names:

my $root = \@all_with_names;

Now how do we get to Gilligan’s first item?

$root -> [2] -> [1] -> [0]

More simply, using the “drop arrow” rule, we can use:

$root -> [2][1][0]

We cannot drop the first arrow, however, because that would mean an array @root’s third element, an entirely unrelated data structure. Let’s compare this to the full curly-brace form again:

${${${$root}[2]}[1]}[0]

It looks much better with the arrow. Note, however, that no shortcut gets the entire array from an array reference. If we want all of Gilligan’s provisions, we say:

@{$root->[2][1]}

Reading this from the inside out, we can think of it like this:

  • Take $root.

  • Dereference it as an array reference, taking the third element of that array (index number 2).

  • Dereference that as an array reference, taking the second element of that array (index number 1).

  • Dereference that as an array reference, taking the entire array.

The last step doesn’t have a shortcut arrow form. Oh well.[*]

References to Hashes

Just as we can take a reference to an array, we can also take a reference to a hash. Once again, we use the backslash as the “take a reference to” operator:

my %gilligan_info = (
  name     => 'Gilligan',
  hat      => 'White',
  shirt    => 'Red',
  position => 'First Mate',
);
my $hash_ref = \%gilligan_info;

We can dereference a hash reference to get back to the original data. The strategy is similar to dereferencing an array reference. We write the hash syntax as we would have without references and then replace the name of the hash with a pair of curly braces surrounding the thing holding the reference. For example, to pick a particular value for a given key, we do this:

my $name = $ gilligan_info { 'name' };
my $name = $ { $hash_ref } { 'name' };

In this case, the curly braces have two different meanings. The first pair denotes the expression returning a reference, while the second pair delimits the expression for the hash key.

To perform an operation on the entire hash, we proceed similarly:

my @keys = keys % gilligan_info;
my @keys = keys % { $hash_ref };

As with array references, we can use shortcuts to replace the complex curly-braced forms under some circumstances. For example, if the only thing inside the curly braces is a simple scalar variable (as shown in these examples so far), we can drop the curly braces:

my $name = $$hash_ref{'name'};
my @keys = keys %$hash_ref;

Like an array reference, when referring to a specific hash element, we can use an arrow form:

my $name = $hash_ref->{'name'};

Because a hash reference fits wherever a scalar fits, we can create an array of hash references:

my %gilligan_info = (
  name     => 'Gilligan',
  hat      => 'White',
  shirt    => 'Red',
  position => 'First Mate',
);
my %skipper_info = (
  name     => 'Skipper',
  hat      => 'Black',
  shirt    => 'Blue',
  position => 'Captain',
);
my @crew = (\%gilligan_info, \%skipper_info);

Thus, $crew[0] is a hash reference to the information about Gilligan. We can get to Gilligan’s name via any one of:

${ $crew[0] } { 'name' }
my $ref = $crew[0]; $$ref{'name'}
$crew[0]->{'name'}
$crew[0]{'name'}

On that last one, we can still drop the arrow between “subscripty kinds of things,” even though one is an array bracket and one is a hash brace.

Let’s print a crew roster:

my %gilligan_info = (
  name     => 'Gilligan',
  hat      => 'White',
  shirt    => 'Red',
  position => 'First Mate',
);
my %skipper_info = (
  name     => 'Skipper',
  hat      => 'Black',
  shirt    => 'Blue',
  position => 'Captain',
);
my @crew = (\%gilligan_info, \%skipper_info);

my $format = "%-15s %-7s %-7s %-15s\n";
printf $format, qw(Name Shirt Hat Position);
for my $crewmember (@crew) {
  printf $format,
    $crewmember->{'name'},
    $crewmember->{'shirt'},
    $crewmember->{'hat'},
    $crewmember->{'position'};
}

That last part looks very repetitive. We can shorten it with a hash slice. Again, if the original syntax is:

@ gilligan_info { qw(name position) }

the hash slice notation from a reference looks like:

@ { $hash_ref } { qw(name position) }

We can drop the first brace pair because the only thing within is a simple scalar value, yielding:

@ $hash_ref { qw(name position) }

Thus, we can replace that final loop with:

for my $crewmember (@crew) {
  printf $format, @$crewmember{qw(name shirt hat position)};
}

There is no shortcut form with an arrow (->) for array slices or hash slices, just as there is no shortcut for entire arrays or hashes.

A hash reference prints as a string that looks like HASH(0x1a2b3c), showing the hexadecimal memory address of the hash. That’s not very useful to an end user and only barely more usable to the programmer, except as an indication of the lack of appropriate dereferencing.

Exercises

You can find the answers to these exercises in “Answers for Chapter 4" in the Appendix.

Exercise 1 [5 min]

How many different things do these expressions refer to?

$ginger->[2][1]
${$ginger[2]}[1]
$ginger->[2]->[1]
${$ginger->[2]}[1]

Exercise 2 [30 min]

Using the final version of check_required_items, write a subroutine check_items_for_all that takes a hash reference as its only parameter, pointing at a hash whose keys are the people aboard the Minnow and whose corresponding values are array references of the things they intend to bring onboard.

For example, the hash reference might be constructed like so:

my @gilligan  = ... gilligan items ...;
my @skipper   = ... skipper items ...;
my @professor = ... professor items ...;
my %all = (
  Gilligan  => \@gilligan,
  Skipper   => \@skipper,
  Professor => \@professor,
);
check_items_for_all(\%all);

The newly constructed subroutine should call check_required_items for each person in the hash, updating their provisions list to include the required items.



[*] There are more efficient ways to check list membership for large lists, but for a few items, this is probably the easiest way to do so with just a few lines of code.

[*] Actually, assigning new scalars to elements of @_ after the shift modifies the corresponding variable being passed, but that still wouldn’t let us extend the array with additional mandatory provisions.

[*] Note that we added whitespace in these two displays to make the similar parts line up. This whitespace is legal in a program, even though most programs won’t use it.

[*] O’Reilly Media has a great book to help you be nice to the next guy. Perl Best Practices by Damian Conway has 256 tips on writing more readable and maintainable Perl code.

[*] It’s not that it hasn’t been discussed repeatedly by the Perl developers; it’s just that nobody has come up with a nice backward-compatible syntax with universal appeal.

Get Intermediate Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.