What is good coding practice? What is readable code?
For some programmers, these questions lead to heated arguments. In the relatively young field of programming, it’s natural that generally accepted rules of style and usage haven’t yet emerged. Fortunately, our colleagues in the more mature field of philology (the study of language as used in literature) have set examples that we can follow. In this article, I’ll describe Fathom, a module that grades the readability of Perl programs.
You may have experience with the grammar check feature of some word processors, which finds likely spelling, grammar, and usage errors in your documents. These tools can be quite useful, particularly for people who don’t do much writing, or for people who haven’t had much writing instruction.
As a programmer who works mostly in teams, often training new or junior programmers during time-critical projects, I want automated ways to encourage compliance with team coding standards. I know that such tools can (and do) work for business writing, but I’ve been unable to find a tool that would do the job for business coding. I did some investigation to see if any of the available grammar checkers could be adapted for use with program code.
There are many well-known measures of readability in literature. You may have heard of Flesch-Kincaid, FOG, SMOG, Bormuth, or other readability or grade level tests; Microsoft Word uses three Flesch-Kincaid tools to evaluate style. These tests generally look at the average number of syllables per word and the average number of words per sentence, then report a single number which indicates either the grade level (1–12) or readability (usually 1–100) of the document. As an example, the Flesch-Kincaid formula for determining the grade level of a document is:
((average sentence length in words
) * 0.39) + ((average syllables per word
) * 11.8) - 15.59
Unfortunately, these measures don’t map well onto code; for
example, how many syllables are there in ++
or {
or $_?
Is select
easier to read than gethostbyname?
Once I realized that I wouldn’t be able to simply run one of the prose-readability tests on my code and get meaningful results, I began to study the design and function of those tests. Then, I constructed a working model for code readability.
After thinking about tools like Flesch-Kincaid, and discussing the idea of a readability tool with colleagues, I came up with a basic model for a code readability metric. I decided to measure the number of tokens per expression, the number of expressions per statement, and the number of statements per subroutine. Some sample tokens:
++
$foo::bar
;
{
&&
any keyword
Some sample expressions:
0.2 ($a + 6) wantarray ? @a : 0
And some sample statements:
$a = $foo::bar * 7; $x++;
Given the basic model I’ve described, I wrote a module, Fathom, that grades the readability of a Perl program. It rates on an open-ended scale, where 1 indicates a trivial program, 5 indicates “mature” code, 7 indicates very sophisticated code, and anything over 7 is Very Hairy. I established the following norms for mature code:
3 tokens per expression |
6 expressions per statement |
20 statements per subroutine |
From this, I came up with the following formula:
code complexity
= ((average expression length in tokens
) * 0.55) + ((average statement length in expressions
) * 0.28) + ((average subroutine length in statements
) * 0.08)
If you plug the norms (3, 6, 20) into this formula, you’ll see that ideal mature code actually gets a score of 4.93; that’s because I rounded all the multipliers to two decimal digits, to keep things simple.
First, you’ll need to install Fathom. You can find it on CPAN, under authors/id/K/KS/KSTAR.
After installing Fathom, you can invoke it as follows:
perl -MO=Fathom filename
The output looks like this:
315 tokens 97 expressions 17 statements 1 subroutine readability is 4.74 (easier than the norm)
Perl is an unusual programming language, in that it has dynamic syntax; that is, any programmer can write code that extends or changes the syntax of Perl. Consider the following code:
use Mystery; if (mystery /1/ . . .
You can’t parse this without knowing about Mystery.pm
! Let’s consider two different
versions of Mystery.pm
.
Version 1:
package Mystery; sub main::mystery { return 5; } 1;
Version 2:
package Mystery; sub main::mystery() { return 5; } 1;
These two packages are almost trivially different. They both
define one function, named mystery
,
which returns the value 5. However, the second version uses a
prototype. In the first case, our program parses as:
if (mystery( the results of matching the regular expression /1/
...
In the second case, it parses as:
if (mystery() divided by 1 divided by
...
By the time you’ve written a program that can successfully parse every possible case, you’ve rewritten Perl!
Fortunately, the Perl compiler gives us access to the pertinent guts of Perl, allowing us to calculate the tokens and expressions directly; see the Fathom source code for details. Without the compiler, this project would have been prohibitively difficult.
Here are some examples of Fathom evaluations:
Benchmark.pm 27 tokens 7 expressions 5 statements 1 subroutine readability is 2.91 (very readable) Apache::AdBlocker 47 tokens 13 expressions 6 statements 1 subroutine readability is 3.08 (readable) CGI/Carp.pm 66 tokens 22 expressions 11 statements 1 subroutine readability is 3.09 (readable) perl5.005/eg/travesty 259 tokens 96 expressions 33 statements 1 subroutine readability is 4.94 (easier than the norm) s2p 2588 tokens 826 expressions 384 statements 11 subroutines readability is 5.12 (mature) CGI.pm 521 tokens 180 expressions 54 statements 1 subroutine readability is 6.85 (complex) DBI.pm 835 tokens 252 expressions 58 statements 1 subroutine readability is 7.68 (very difficult) diagnostics.pm 767 tokens 272 expressions 96 statements 1 subroutine readability is 10.02 (obfuscated)
I intend to continue to refine Fathom in several ways: by tweaking its basic formula to produce more accurate grades, by considering the placement and length of comments and pods, by having it identify problematic code sections, and by having it make specific suggestions for improvement.
There are also some problems I hope to address in the near
future: Fathom doesn’t see code that executes at compile time, such as code in
BEGIN
blocks or use
statements, and sometimes it counts
implicit tokens, such as $_
in a
foreach
statement. These
limitations probably won’t make much statistical difference in a
medium-to-large program, but they could give wildly strange grades to
one-liners and other short hacks.
Fathom also opens the door to a whole suite of companion tools:
a program that checks variable names against a site-wide naming
policy; a tool, much like C’s indent
, to normalize the indentation of
Perl code; and likely several more tools, based on
experience and feedback. Some of these are already being developed by
others.
Perl’s extraordinary architecture makes it possible to produce very powerful companion tools without having to re-invent the wheel. Fathom was developed with a relatively small amount of original code—it simply hooks into the pre-existing Perl internal data structures to do its job. Similarly, the Perl debugger uses built-in features of Perl, plus a minimal amount of black magic, to provide a full-featured debugging environment for your Perl programs.
In most other languages, writing a tool like Fathom would force
you to start from scratch, since some of the best tools for other
languages (e.g., gdb, indent
, and
cxref
for C) are based on code that
is completely independent from the compilers or interpreters that they
complement. In the case of languages that are still undergoing
refinement (such as C++), maintenance of these tools can be a
nightmare. However, Fathom will continue to work even if
Perl’s syntax changes, because it’s hooked into the Perl
compiler itself!
I hope that you’re so intrigued by Fathom that you’ll want to refine it, rewrite it, or develop new tools in a similar vein. Try this at home, kids!
Get Games, Diversions & Perl Culture now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.