Computer Science & Perl Programming

Chapter 4. Precedence

Mark Jason Dominus

What Is Precedence?

What’s 2 + 3 x 4?

We learned about this in grade school; it was fourth-grade material in the New York City public school I attended. If not, that’s okay too; I’ll explain everything.

It’s well-known that 2 + 3 x 4 is 14, because we are supposed to do the multiplication before the addition. 3 x 4 is 12, and then we add the 2 and get 14. What we do not do is perform the operations in left-to-right order; if we did that we would add 2 and 3 to get 5, then multiply by 4 and get 20.

This is just a convention about what an expression like 2 + 3 x 4 means. It’s not an important mathematical fact; it’s just a rule about how to interpret certain ambiguous arithmetic expressions. It could have gone the other way, or we could have the rule that the operations are always done left-to-right. But we don’t have those rules; we have the rule that says that you do the multiplication first and then the addition. We say that multiplication takes precedence over addition.

What if we really do want to say: “Add 2 and 3, and multiply the result by 4”? Then we use parentheses, like this: (2 + 3) x 4. The rule about parentheses is that expressions in parentheses must always be fully evaluated before anything else.

If we always used the parentheses, we wouldn’t need rules about precedence. There wouldn’t be any ambiguous expressions. We have precedence rules because we’re lazy and we like to leave out the parentheses when we can. The fully-parenthesized form is always unambiguous. The precedence rule tells us how to interpret a version with fewer parentheses to decide what it would look like if we wrote the equivalent fully-parenthesized version. In the example above:

2 + (3 x 4)
(2 + 3 ) x 4

Is 2 + 3 x 4 like the first or like the second? The precedence rule just tells us that it is like the first.

Rules and More Rules

In grade school we learned a few more rules:

4 x 5²

Which of these interpretations is correct?

(4 x 5)²= 400

4 x (5²) = 100

The rule is that exponentiation takes precedence over multiplication, so it’s 100 and not 400.

What about 8 – 3+ 4? Is this like (8 – 3) + 4 = 9 or 8 – (3+ 4) = 1? Here the rule is a little different. Neither + nor – has precedence over the other. Instead, the – and + are just done left-to-right. This rule handles the case of 8 – 4 – 3also. Is it (8 – 4) – 3 = 1 or is it 8 – (4 – 3) = 7? Subtractions are done left-to-right, so it’s 1 and not 7. A similar left-to-right rule handles ties between x and /.

Our rules are getting complicated now:

Exponentiation first.
Next multiplication and division, left to right.
Then addition and subtraction, left to right.

Can we leave out the “left-to-right” part and just say that all ties will be broken left-to right? No, because for exponentiation that isn’t true.

2^2³

means

2^(2³), = 256, not (2²)³ = 64.

So exponentiations are resolved from upper-right to lower-left. Perl uses the token ** to represent exponentiation, using x**y instead of x^y. In this case x**y**z means x**(y**z), not (x**y)**z, so ** is resolved right-to-left.

Programming languages have the same notational problem, except it’s even worse than in mathematics, partly because programmer’s languages have so many different operator symbols. For example, Perl has at least 70 different operator symbols. This is a problem, because communication with the compiler and with other programmers must be unambiguous. We don’t want to write something like 2 + 3 x 4 and have Perl compute 20 when we wanted 14, or vice versa.

Nobody knows a really good solution to this problem, and different languages solve it in different ways. For example, the language APL, which has a whole lot of unfamiliar operators like ρ and , dispenses with precedence entirely and resolves them all from right-to-left. The advantage of this is that we don’t have to remember any rules, and the disadvantage is that many expressions are confusing: if we write 2 x 3 + 4, you get 14, not 10. In LISP the issue never comes up, because in LISP the parentheses are required, and so there are no ambiguous expressions. (Now you know why LISP looks the way it does.)

Perl, with its 70 operators, has to solve this problem somehow. The strategy Perl takes (and most other programming languages as well) is to take the fourth-grade system and extend it to deal with the new operators. The operators are divided into many precedence levels, and certain operations, like multiplication, have higher precedence than other operations, like addition. The levels are essentially arbitrary, and are chosen without any deep plan, but with the hope that you will be able to omit most of the parentheses most of the time and still get what you want. So, for example, Perl gives * a higher precedence than +, and ** a higher precedence than *, just like in grade school.

An Explosion of Rules

Let’s see some examples of the reasons for which the precedence levels are set the way they are. Suppose we wrote something like this:

	$v = $x + 3;

This is actually ambiguous. It might mean:

	($v = $x) + 3;

or it might mean:

	$v = ($x + 3);

The first of these is silly, because it stores the value $x into $v, and then computes the value of $x + 3 and throws the result of the addition away. In this case, the addition was useless. The second one, however, makes sense, because it does the addition first and stores the result into $v. Since people write things like:

	$v = $x + 3;

all the time, and expect to get the second behavior and not the first, Perl’s = operator has low precedence, lower than the precedence of +, so that Perl uses the second interpretation.

Here’s another example:

	$result = $x =~ /foo/;

means this:

	$result = ($x =~ /foo/);

which looks to see if $x contains the string foo, and stores a true or false result into $result. It doesn’t mean this:

	($result = $x) =~ /foo/;

which copies the value of $x into $result and then looks to see if $result contains foo. In this case it’s likely that the programmer wanted the first meaning, not the second. But sometimes we do want it to go the other way. Consider this expression:

	$p = $q =~ s/w//g;

Again, this expression is interpreted this way:

	$p = ($q =~ s/w//g);

All the w’s are removed from $q, and the number of successful substitutions is stored into $p. However, sometimes we really do want the other meaning:

	($p = $q) =~ s/w//g;

This copies the value of $q into $p, and then removes all the w’s from $p, leaving $q alone. If we want this, we have to include the parentheses explicitly, because = has lower precedence than =~.

Often, the rules do what we want them to. Consider this:

	$worked = 1 + $s =~ /pattern/;

There are five ways to interpret this:

($worked = 1) + ($s =~ /pattern/);
(($worked = 1) + $s) =~ /pattern/;
($worked = (1 + $s)) =~ /pattern/;
$worked = ((1 + $s) =~ /pattern/);
$worked = (1 + ($s =~ /pattern/));

We already know that + has higher precedence than =, so it happens before =, and that rules out (1) and (2)

We also know that =~ has higher precedence than =, so that rules out (3).

To choose between (4) and (5), we need to know whether = takes precedence over =~ or vice versa. (4) will convert $s to a number, add 1 to it, convert the resulting number to a string, and do the pattern match. That is a pretty silly thing to do. (5) will match $s against the pattern, return a boolean result, add 1 to that result to yield the number 1 or 2, and store the number into $worked. That makes a lot more sense; perhaps $worked will be used later to index an array. We should hope that Perl chooses interpretation (5) rather than (4). And in fact, that is what it does, because =~ has higher precedence than +. =~ behaves similarly with respect to multiplication.

Our table of precedence is shaping up:

** (right to left)
=~
* /, (left to right)
+ -, (left to right)
=

How are multiple symbols resolved? Left-to-right, or right-to-left? The question is whether this:

	$a = $b = $c;

will mean this:

	($a = $b) = $c;

or this:

	$a = ($b = $c);

The first one means to store the value of $b into $a, and then to store the value of $c into $a; this is obviously not useful. But the second one means to store the value of $c into $b, and then to store that value into $a also, and that obviously is useful. So, = is resolved right-to-left.

Why does =~ have lower precedence than **? No good reason. It’s just a side effect of the low precedence of =~ and the high precedence of **. It’s probably very rare to have =~ and ** in the same expression anyway. Perl tries to get the common cases right. Here’s another common case:

	if ($x == 3 && $y == 4) { ... }

Is this interpreted as:

(($x == 3) && $y) == 4
($x == 3) && ($y == 4)
($x == (3 && $y)) == 4
$x == ((3 && $y) == 4)
$x == (3 && ($y == 4))

We really hope that it will be (2). To make (2) occur, && must have lower precedence than ==; if the precedence is higher we’ll get (3) or (4), which would be awful. So && has lower precedence than ==. If this seems like an obvious decision, consider that Pascal got it wrong.

|| has about the same precedence as &&, but slightly lower, in accordance with the usual convention of mathematicians, and by analogy with * and +. ! has high precedence, because when people write:

	!$x .....some long complicated expression....

they almost always mean that the ! applies to the $x, not to the entire long complicated expression. In fact, almost the only time they don’t mean this is in cases like this one:

	if (! $x->{annoying}) { ... }

It would be very annoying if this were interpreted to mean:

	if ((! $x)->{annoying}) { ... }

The same argument we used to explain why ! has high precedence works even better and explains why -> has even higher precedence. In fact, -> has the highest precedence of all. If ## and @@ are any two operators at all, then:

	$a ## $x->$y

and

	$x->$y @@ $b

always mean

	$a ## ($x->$y)

and

	($x->$y) @@ $b

and not

	($a ## $x)->$y

	$x->($y @@ $b)

For a long time, the operator with lowest precedence was the , operator. The , operator is for evaluating two expressions in sequence. For example:

	$a*=2 , $c*=3

doubles $a and triples $c. It would be a shame if we wrote something like this:

	$a*=2 , $c*=3 if $change_the_variables;

and Perl interpreted it to mean this:

	$a*= (2, $c) *= 3 if $change_the_variables;

That would certainly be bizarre. The very low precedence of , ensures that we can write:

	EXPR1, EXPR2

for any two expressions at all, and be sure that they are not going to get mashed together to make some nonsense expression like $a *= (2, $c) *= 3.

The comma is also the list constructor operator. If we want to make a list of three things, we have to write:

	@list = ('Gold', 'Frankincense', 'Myrrh');

because if we left off the parentheses, like this:

	@list = 'Gold', 'Frankincense', 'Myrrh';

what we would get would be the same as this:

	(@list = 'Gold'), 'Frankincense', 'Myrrh';

This assigns @list to have one element (Gold) and then executes the two following expressions in sequence, which is pointless. So this is a prime example of a case where the default precedence rules don’t do what we want. But people are already in the habit of putting parentheses around their list elements, so nobody minds this very much, and the problem isn’t really a problem at all.

Precedence Traps and Surprises

This very low precedence for commas causes some other problems, however. Consider the common idiom:

	open(F, "< $file") || die "Couldn't open $file: $!";

This tries to open a filehandle, and if it can’t, it aborts the program with an error message. Now watch what happens if we leave the parentheses off the open call:

	open F, "< $file" || die "Couldn't open $file: $!";

The comma has very low precedence, so the || takes precedence here, and Perl interprets the expression as if we had written this:

	open F, ("< $file" || die "Couldn't open $file: $!");

This is totally bizarre, because the die will only be executed when the string "<$file" is false, which never happens. Since the die is controlled by the string and not by the open call, the program will not abort on errors the way we wanted. Here we wish that || had lower precedence, so that we could write:

	try to perform big long hairy complicated action    || die ;

and be sure that the || was not going to gobble up part of the action the way it did in our open example. Perl 5 introduced a new version of || that has low precedence, for exactly this purpose. It’s spelled or, and in fact it has the lowest precedence of all Perl’s operators. We can write:

	try to perform big long hairy complicated action    or die ;

and be quite sure that or will not gobble up part of the action the way it did in our open example, whether or not we leave off the parentheses. To summarize:

	open(F, "< $file") or die "Couldn't open $file: $!"; # OK

	open F, "< $file"  or die "Couldn't open $file: $!"; # OK

	open(F, "< $file") || die "Couldn't open $file: $!"; # OK

	open F, "< $file"  || die "Couldn't open $file: $!"; # Whoops!

If we use or, we’re safe from this error, and if we always put in the parentheses, we’re safe. Pick a strategy and stick with it.

The other major use for || is to select a value from the first source that provides it. For example:

	$directory = $opt_D || $ENV{DIRECTORY} || $DEFAULT_DIRECTORY;

This looks to see if there was a -D command-line option specifying the directory first; if not, it looks to see if the user set the DIRECTORY environment variable; if neither of these is set, it uses a hard-wired default directory. It gets the first value that it can; for example, if we have the environment variable set and supply an explicit -D option when we run the program, the option overrides the environment variable. The precedence of || is higher than =, so this means what we wanted:

	$directory = ($opt_D || $ENV{DIRECTORY} || $DEFAULT_DIRECTORY);

But some people might end up sabotaging themselves by writing something like this:

	$directory = $opt_D or $ENV{DIRECTORY} or $DEFAULT_DIRECTORY;

or has extremely low precedence, even lower than =, so Perl interprets this as:

	($directory = $opt_D) or $ENV{DIRECTORY} or $DEFAULT_DIRECTORY;

$directory is always assigned from the command line option, even if none was set. Then the values of the expressions $ENV{DIRECTORY} and $DEFAULT_DIRECTORY are thrown away. Perl’s -w option will warn us about this mistake if we make it. To avoid it, remember this rule of thumb: use || for selecting values, and or for controlling the flow of statements.

List Operators and Unary Operators

A related problem is that all of Perl’s list operators have high precedence, and tend to gobble up everything to their right. (A list operator is a Perl function that accepts a list of arguments, like open or print.) We already saw this problem with open. Here’s a similar problem:

	@successes = (unlink $new, symlink $old, $new, open N, $new);

This isn’t even clear to humans. What we really meant was:

	@successes = (unlink($new), symlink($old, $new), open(N, $new));

which performs the three operations in sequence and stores the three success-or-failure codes into @successes. But what Perl thought we meant here was something totally different:

	@successes = (unlink($new, symlink($old, $new, open(N, $new))));

It thinks that the result of the open call should be used as the third argument to symlink, and that the result of symlink should be passed to unlink, which will try to remove a file with that name. This won’t even compile, because symlink wants two arguments, not three. We saw one way to disambiguate this; another is to write it like this:

	@successes = ((unlink $new), (symlink $old, $new), (open N, $new));

Again, pick a style and stick with it.

Why do Perl list operators gobble up everything to the right? Often, it’s very handy. For example:

	@textfiles = grep -T, map "$DIRNAME/$_", readdir DIR;

Here Perl behaves as if we had written this:

	@textfiles = grep(-T, (map("$DIRNAME/$_", (readdir(DIR)))));

Some filenames are read from the directory handle with readdir, and the resulting list is passed to map, which turns each filename into a full pathname and returns a list of paths. Then grep filters the list of paths, extracts all the paths that refer to text files, and returns a list of just the text files from the directory.

One possibly fine point is that the parentheses might not always mean what we want. For example, suppose we had this:

	print $a, $b, $c;

Then we discover that we need to print out double the value of $a. If we do this, we’re safe:

	print 2*$a, $b, $c;

but if we do this, we might get a surprise:

	print (2*$a), $b, $c;

If a list operator is followed by parentheses, Perl assumes that the parentheses enclose all the arguments, so it interprets this as:

	(print (2*$a)), $b, $c;

It prints out twice $a, but doesn’t print out $b or $c at all. (Perl warns us about this if we have -w on.) To fix this, add more parentheses:

	print ((2*$a), $b, $c);

Some people will suggest that we do this instead:

	print +(2*$a), $b, $c;

Perl does what we want here, but I think it’s bad advice because it looks bizarre.

Here’s a similar example:

	print @items, @more_items;

Say we want to join up the @items with some separator, so we use join:

	print join '---', @items, @more_items;

Oops; this is wrong; we only want to join @items, not @more_items also. One way we might try to fix this is:

	print (join '---', @items), @more_items;

This falls afoul of the problem we just saw: Perl sees the parentheses, assumes that they contain the arguments of print, and never prints @more_items at all. To fix this, use either of these constructs:

	print ((join '---', @items), @more_items);

	print join('---', @items), @more_items;

Sometimes we won’t have this problem. Some of Perl’s built-in functions are unary operators, which means that they always get exactly one argument. defined and uc are examples. They don’t have the problem that the list operators have of gobbling everything to the right; they only gobble one argument. Here’s an example similar to the one just shown:

	print $a, $b;

Now we decide we want to print $a in all lowercase letters:

	print lc $a, $b;

Don’t we have the same problem as in the print join example? If we did, it would print $b in all lowercase also. But it doesn’t, because lc is a unary operator and only gets one argument. This doesn’t need any fixing.

Complete Rules of Precedence

Perl’s complete precedence table is shown in Table 4-1.

Table 4-1. Perl’s operator precedences

Operator	Associativity
Terms and list operators (leftward)	left
->	left
++ --	nonassoc
**	right
! ~ \ and unary + and -	right
=~ !~	left
* / % x	left
+ - .	left
<<>>	left
Named unary operators	nonassoc
< > <= >= `lt gt le ge`	nonassoc
== != <=> `eq ne cmp`	nonassoc
&	left
\| ^	left
&&	left
\|\|	left
.. …	nonassoc
?:	right
= += -= *= etc.	right
, =>	left
List operators (rightward)	nonassoc
not	right
and	left
or xor	left

This is straight out of the perlop documentation that comes with Perl. left and right mean that the operators associate to the left or the right, respectively; nonassoc means that the operators don’t associate at all. For example, if we try to write:

	$a < $b < $c

Perl 5 will deliver a syntax error message. Perhaps what we really meant was:

	$a < $b && $b < $c

The precedence table is much too big and complicated to remember; that’s a problem with Perl’s approach. We have to trust it to handle to common cases correctly, and be prepared to deal with bizarre, hard-to-find bugs when it doesn’t do what we wanted. The alternatives have their own disadvantages.

How to Remember All the Rules

Probably the best strategy for dealing with Perl’s complicated precedence hierarchy is to cluster the operators mentally:

Arithmetic: +, -, *, /, %, **

Bitwise: &, |, ~, <<, >>

Logical: &&, ||, !

Comparison: ==, !=, >=, <=, >, <

Assignment: =, +=, -=, *=, /=, etc.

Try to remember how the operators behave within each group. Mostly the answer will be “They behave as expected.” For example, the operators in the “arithmetic” group all behave the according to the rules from fourth grade. The “comparison” group all have about the same precedence, and we aren’t allowed to mix them anyway, except to say something like:

	$a<$b == $c<$d

which compares the truth values of $a<$b and $c<$d.

Then, once we’re familiar with the rather unsurprising behavior of the most common groups, we can just use parentheses liberally everywhere else.

Starting in Perl 5.005_03, we can use the B::Deparse module to print out what the expression would look like if it had all the implied parentheses inserted. We can use this to make sure Perl is interpreting an expression in the way we think it is. For example, let’s check to make sure we gave the right interpretation of $worked = 1 + $s =~ /pattern/ earlier:

	perl -MO=Deparse,-p -e '$worked = 1 + $s =~ /pattern/'

Perl prints out:

	($worked = (1 + ($s =~ /pattern/)));

as we expected. The poption here stands for “print precedence-preserving parentheses.”

Quiz

Try to guess how Perl interprets the following expressions: (or use B::Deparse.)

$x = $x | $y << 3;
$y % 4 == 0 && $y % 100 != 0 || $y % 400 == 0
$V = 4/3*$PI*$r**3;
$x >= 1 || $x <= 10

Answers

$x = ($x | ($y << 3));
((($y % 4) == 0) && (($y % 100) != 0)) || (($y % 400) == 0) (This computes whether or not the year $y is a leap year.)
$V = ((4/3)*$PI*($r**3)); (This is the volume of a sphere with radius $r.)
($x >= 1) || ($x <= 10)

Get Computer Science & Perl Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial