O'Reilly logo

Games, Diversions & Perl Culture by Jon Orwant

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Perl Style

Kurt Starsinic

What is good coding practice? What is readable code?

For some programmers, these questions lead to heated arguments. In the relatively young field of programming, it’s natural that generally accepted rules of style and usage haven’t yet emerged. Fortunately, our colleagues in the more mature field of philology (the study of language as used in literature) have set examples that we can follow. In this article, I’ll describe Fathom, a module that grades the readability of Perl programs.

Background

You may have experience with the grammar check feature of some word processors, which finds likely spelling, grammar, and usage errors in your documents. These tools can be quite useful, particularly for people who don’t do much writing, or for people who haven’t had much writing instruction.

As a programmer who works mostly in teams, often training new or junior programmers during time-critical projects, I want automated ways to encourage compliance with team coding standards. I know that such tools can (and do) work for business writing, but I’ve been unable to find a tool that would do the job for business coding. I did some investigation to see if any of the available grammar checkers could be adapted for use with program code.

Existing Measures

There are many well-known measures of readability in literature. You may have heard of Flesch-Kincaid, FOG, SMOG, Bormuth, or other readability or grade level tests; Microsoft Word uses three Flesch-Kincaid tools to evaluate style. These tests generally look at the average number of syllables per word and the average number of words per sentence, then report a single number which indicates either the grade level (1–12) or readability (usually 1–100) of the document. As an example, the Flesch-Kincaid formula for determining the grade level of a document is:

  ((average sentence length in words) * 0.39)
+ ((average syllables per word) * 11.8)
- 15.59

Unfortunately, these measures don’t map well onto code; for example, how many syllables are there in ++ or { or $_? Is select easier to read than gethostbyname?

Once I realized that I wouldn’t be able to simply run one of the prose-readability tests on my code and get meaningful results, I began to study the design and function of those tests. Then, I constructed a working model for code readability.

The Basic Units

After thinking about tools like Flesch-Kincaid, and discussing the idea of a readability tool with colleagues, I came up with a basic model for a code readability metric. I decided to measure the number of tokens per expression, the number of expressions per statement, and the number of statements per subroutine. Some sample tokens:

++
$foo::bar
;
{
&&
any keyword

Some sample expressions:

0.2
($a + 6)
wantarray ? @a : 0

And some sample statements:

$a = $foo::bar * 7;
$x++;

The Tool

Given the basic model I’ve described, I wrote a module, Fathom, that grades the readability of a Perl program. It rates on an open-ended scale, where 1 indicates a trivial program, 5 indicates “mature” code, 7 indicates very sophisticated code, and anything over 7 is Very Hairy. I established the following norms for mature code:

3 tokens per expression
6 expressions per statement
20 statements per subroutine

From this, I came up with the following formula:

code complexity =
  (( average expression length in tokens) * 0.55)
+ (( average statement length in expressions) * 0.28)
+ (( average subroutine length in statements) * 0.08)

If you plug the norms (3, 6, 20) into this formula, you’ll see that ideal mature code actually gets a score of 4.93; that’s because I rounded all the multipliers to two decimal digits, to keep things simple.

Usage

First, you’ll need to install Fathom. You can find it on CPAN, under authors/id/K/KS/KSTAR.

After installing Fathom, you can invoke it as follows:

perl -MO=Fathom filename

The output looks like this:

315 tokens
97 expressions
17 statements
1 subroutine
readability is 4.74 (easier than the norm)

Why This Should Be Hard To Do

Perl is an unusual programming language, in that it has dynamic syntax; that is, any programmer can write code that extends or changes the syntax of Perl. Consider the following code:

use Mystery;
if (mystery /1/ . . .

You can’t parse this without knowing about Mystery.pm! Let’s consider two different versions of Mystery.pm.

Version 1:

package Mystery;
sub main::mystery { return 5; }
1;

Version 2:

package Mystery;
sub main::mystery() { return 5; }
1;

These two packages are almost trivially different. They both define one function, named mystery, which returns the value 5. However, the second version uses a prototype. In the first case, our program parses as:

if (mystery( the results of matching the regular expression /1/ ...

In the second case, it parses as:

if (mystery() divided by 1 divided by ...

By the time you’ve written a program that can successfully parse every possible case, you’ve rewritten Perl!

The Perl Compiler to the Rescue

Fortunately, the Perl compiler gives us access to the pertinent guts of Perl, allowing us to calculate the tokens and expressions directly; see the Fathom source code for details. Without the compiler, this project would have been prohibitively difficult.

Here are some examples of Fathom evaluations:

Benchmark.pm
27 tokens
7 expressions
5 statements
1 subroutine
readability is 2.91 (very readable)

Apache::AdBlocker
47 tokens
13 expressions
6 statements
1 subroutine
readability is 3.08 (readable)

CGI/Carp.pm
66 tokens
22 expressions
11 statements
1 subroutine
readability is 3.09 (readable)

perl5.005/eg/travesty
259 tokens
96 expressions
33 statements
1 subroutine
readability is 4.94 (easier than the norm)

s2p
2588 tokens
826 expressions
384 statements
11 subroutines
readability is 5.12 (mature)

CGI.pm
521 tokens
180 expressions
54 statements
1 subroutine
readability is 6.85 (complex)

DBI.pm
835 tokens
252 expressions
58 statements
1 subroutine
readability is 7.68 (very difficult)

diagnostics.pm
767 tokens
272 expressions
96 statements
1 subroutine
readability is 10.02 (obfuscated)

Future Directions

I intend to continue to refine Fathom in several ways: by tweaking its basic formula to produce more accurate grades, by considering the placement and length of comments and pods, by having it identify problematic code sections, and by having it make specific suggestions for improvement.

There are also some problems I hope to address in the near future: Fathom doesn’t see code that executes at compile time, such as code in BEGIN blocks or use statements, and sometimes it counts implicit tokens, such as $_ in a foreach statement. These limitations probably won’t make much statistical difference in a medium-to-large program, but they could give wildly strange grades to one-liners and other short hacks.

Fathom also opens the door to a whole suite of companion tools: a program that checks variable names against a site-wide naming policy; a tool, much like C’s indent, to normalize the indentation of Perl code; and likely several more tools, based on experience and feedback. Some of these are already being developed by others.

Perl’s extraordinary architecture makes it possible to produce very powerful companion tools without having to re-invent the wheel. Fathom was developed with a relatively small amount of original code—it simply hooks into the pre-existing Perl internal data structures to do its job. Similarly, the Perl debugger uses built-in features of Perl, plus a minimal amount of black magic, to provide a full-featured debugging environment for your Perl programs.

In most other languages, writing a tool like Fathom would force you to start from scratch, since some of the best tools for other languages (e.g., gdb, indent, and cxref for C) are based on code that is completely independent from the compilers or interpreters that they complement. In the case of languages that are still undergoing refinement (such as C++), maintenance of these tools can be a nightmare. However, Fathom will continue to work even if Perl’s syntax changes, because it’s hooked into the Perl compiler itself!

I hope that you’re so intrigued by Fathom that you’ll want to refine it, rewrite it, or develop new tools in a similar vein. Try this at home, kids!

Acknowledgments

Fathom would not have been possible without Malcolm Beattie’s outstanding work on the Perl compiler. Stephen McCamant’s B::Deparse module was tremendously helpful in demonstrating how to write a compiler backend. And, of course, I couldn’t have done any of this without such a rich language as Perl.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required