A View of the Internals

Let us now take a look inside Perl to understand how it manages memory. You can safely skip this section without loss of continuity.

A variable logically represents a binding between a name and a value, as Figure 1.3 illustrates.[13]

A variable is a name and value pair

Figure 1-3. A variable is a name and value pair

An array or a hash is not just a collection of numbers or strings. It is a collection of scalar values, and this distinction is important, as Figure 1.4 illustrates.

An array value is a collection of scalar values

Figure 1-4. An array value is a collection of scalar values

Each box in Figure 1.4 represents a distinct value. An array has one value that represents the collection of scalar values. Each element of the array is a distinct scalar value. This is analogous to a pride of lions being treated as a single entity (which is why we refer to it in the singular) that has properties distinct from those of the individual lion.

Notice also that while a name always points to a value, a value doesn’t always have to be pointed to by a name, as the array elements in Figure 1.4 or anonymous arrays and hashes exemplify.

Reference Counts

To support painless and transparent memory management, Perl maintains a reference count for every value, whether it is directly pointed to by a name or not. Let’s add this piece of information to our earlier view. Refer to Figure 1.5.

Adding reference counts to all values

Figure 1-5. Adding reference counts to all values

As you can see, the reference count represents the number of arrows pointing to the value part of a variable. Because there is always an arrow from the name to its value, the variable’s reference count is at least 1. When you obtain a reference to a variable, the corresponding value’s reference count is incremented.

It is important to stress the point that even though we would like to think of $ra as pointing to $a, it really points to $a’s value. In fact, $ra does not even know whether the value it is pointing to has a corresponding entry in the symbol table. The value of the reference variable is the address of another scalar value, which does not change even if $a’s value changes.

Perl automatically deletes a value when its reference count drops to zero. When variables (named values) go out of scope, the binding between the name and the value is removed, resulting in the value’s reference count being decremented. In the typical case in which this count is 1, the value is deleted (or garbage collected ).[14]

The reference counting technique is sometimes referred to as "poor man’s garbage collection,” in contrast to much more sophisticated techniques used by environments such as LISP, Java, and Smalltalk (though the early versions of Smalltalk used reference counting). The problem is that reference counts take up space, which adds up if you consider that every piece of data in your application has an extra integer associated with it.

Then there is also the problem of circular references. The simplest case is this:

$a = \$a;

This is a classic case of narcissism. $a’s reference count indicates that something is pointing to it, so it will never get freed. A more practical case of circular references is that of network graphs (each node keeps track of each of its neighbors) or ring buffers (where the last element points to the first one). Modern garbage collection algorithms implemented in Java and Smalltalk can detect circular references and deallocate the entire circular structure if none of the elements are reachable from other variables.

On the other hand, reference counting is simple to understand and implement and makes it easy to integrate Perl with C or C++ code. Please refer to item 2 in the Section 1.8 section at the end of the chapter for a comprehensive treatment of garbage collection techniques.

Note that while symbolic references allow you to access variables in an indirect way, no actual reference variables are created. In other words, the reference count of a symbolically accessed variable is not modified. Hence symbolic references are also called soft references, in contrast to hard references, which actually allocate storage to keep track of the indirection.

This is similar to the concept of soft versus hard links in the Unix filesystem. The i-node of a file has its reference count incremented every time someone creates a hard link to that file, so you can’t really delete the file’s contents until its reference count goes down to zero. A symbolic link, on the other hand, stores only the name of the file and can point to a nonexistent file; you’ll never know until you try to open the file using the symbolic link.

Array/Hash References Versus Element References

Recall that there is a distinction between the array as a whole and each of its constituent scalar values. The array’s value maintains its own reference count, and each of its elements has its own. When you take a reference to an array, its own reference count is incremented without its elements getting affected, as shown in Figure 1.6.

Taking a reference to an array

Figure 1-6. Taking a reference to an array

In contrast, Figure 1.7 shows the picture when you create a reference to an element of an array or a hash.

Referring to a list element

Figure 1-7. Referring to a list element

When you take a reference to an element of an array (or a hash), Perl increments that scalar value’s reference count. If, say, you now pop it from the array, its reference count goes down by 1 because the array is no longer interested in the scalar value. But since there is an outstanding reference to the array element (and its reference count is still 1), it is not destroyed. Figure 1.8 shows the picture after @array has been popped once.

@array popped once; $r_array_elem holds on to the popped scalar

Figure 1-8. @array popped once; $r_array_elem holds on to the popped scalar



[13] This is true whether the variable is global, dynamically scoped (using local()), or lexically scoped (using my()). More details are given in Chapter 3.

[14] For efficiency, Perl doesn’t actually delete it; it just sends it to its own free pool and reuses it when you need a new value. It is logically deleted, nevertheless.

Get Advanced Perl Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.