Computer Science & Perl Programming

Chapter 1. Introduction

Jon Orwant

“Perl is a language for getting your job done,” begins Programming Perl. As programming languages go, Perl is something of a grab bag, and so is this book.

In this introduction I’ll tell you how the book came to be, first by talking about the history of TPJ, and then about why computer science and Perl programming are a natural combination.

History of TPJ

In 1995, I was angry. Perl had broken away from being stereotyped as a system administration langauge or text processing language, and had managed to claw itself up to merely being stereotyped as a web programming language. I had seen Perl used for AI, astronomy, biology, graphics, natural language processing, and other areas—but Perl’s generality wasn’t being communicated to the programming world. Perl wasn’t getting the reputation it deserved.

So when Tom Christiansen floated the notion of a Perl newsletter on the perl5-porters mailing list, it seemed like a natural idea. I’d just seen my first Perl book printed with my ampersands translated into eights, my vertical bars translated into ones, and my bullet marks depicted as planets complete with rings. (As you might guess, the publisher wasn’t O’Reilly.) I wanted to do Perl publishing right, and at the same time show the world that Perl wasn’t just for system administration any more. And so I set to work with my NeXT workstation and a copy of Framemaker. I found a Boston area printer via the Yellow Pages, and hit up the Perl gurus for articles. I announced the magazine on Usenet, and that was the extent of my marketing.

The reception was mostly enthusiastic, although there was some initial skepticism: people said I was crazy to attempt print rather than web publication. But print has a portability and resolution unrivalled by computer displays, and professional printing provides a sense of permanence that web sites can’t match. Paper affords a control over the graphical layout that is hard to achieve in a browser (even with Cascading Style Sheets). For instance, in my TPJ article on Data Hiding, I hid a message in the spacing between letters, and screened in a faint watermark on the page. And I had a hidden message perpetrated on me in the cover of TPJ #3, where photographer Alan Blount hid “perl sux” in his cover photograph.

I also knew that it would be too easy to let quality slip with a web magazine. The high cost of printing gives each issue a stamp of finality; in contrast, a mistake on a web page could always be fixed later. Masochistic as it sounds, I wanted the deadlines that ink-on-dead-trees printing imposed. And print has more prestige: I wanted Perl to get the respect it deserved, and that meant people finding the magazine in their local bookstore.

Content was the easy part—there’s never been a shortage of people creating interesting applications with Perl, and there are enough nooks and crannies to the language that I was never low on article ideas. By 1999, I had between 50 and 70 article proposals pending at any time, and space for only 15 in each issue. It was the design and visual appearance I was worried about: making it look like a magazine. For the first issue, I bought a stuffed camel at FAO Schwarz and had it photographed for the cover. I couldn’t afford full color printing; I had only enough money for one spot color, and I chose brown for the first issue so that the camel would look natural.

I established the company with $20 and a trip to City Hall in Cambridge, and on a blustery day in February 1996 I printed TPJ’s first issue: five articles and 32 pages. With no idea how popular the magazine would be, and aware of how much it costs to reprint an issue, I decided to aim high: 5,000 copies.

The volume of 5,000 copies of a 32-page magazine, each page slightly thicker than newsprint, was not a calculation that crossed my mind until an eighteen-wheeler pulled up and unloaded 22 boxes of magazines into my tiny two-room apartment. The boxes were stacked from floor to ceiling; I was eating off them, and by the time the deadline for TPJ #2 arrived I was sleeping on them as well.

Over the next year I learned all about United States Postal Service regulations, courtesy of the nice folks at Boston’s South Station 24-hour post office and a scary tattooed bulk-mail freak at Cambridge’s Central Square post office. I learned about presorting mail with rubber bands into dirty sacks with “U.S. MAIL” stenciled on them. I bought Glu-Stik by the case for attaching address labels. I got a swollen tongue from licking too many stamps back in the days before I splurged on a postage meter.

And I hacked. I wrote Perl programs to generate PostScript UPC bar codes, print address labels, convert author drafts from HTML, LaTeX, pod, and plain text into typeset pages. I wrote code to verify the accuracy of the programs in the magazine, correct grammar and spelling mistakes, and maintain subscriber, author, and advertiser databases. I created an entire subscription management system that answered many of the common subscriber requests, from address changes to questions about when the next issue would arrive and when subscriptions would expire. I probably wrote hundreds of quick one-off programs to do things like generate an ASCII copy of the magazine for TPJ’s sole blind subscriber, and to compute circulation demographics (the top Perl countries are, in order, the U.S., the U.K., Germany, Canada, Australia, France, Japan, Switzerland, Sweden, Holland, Norway, Denmark, Finland, and Italy). I wrote the Business::CreditCard module to verify credit card numbers once I became able to accept VISA and MasterCard later in 1996.

To accept credit card numbers, I had to visit my bank and convince a loan officer that they should give a student with a few thousand dollars to his name, living in a rent-controlled apartment with only his 21-inch computer monitor as collateral, the ability to withdraw arbitrarily large sums of money from VISA and MasterCard accounts. That was the one day in TPJ’s history when I wore a suit, and it paid off; they approved my application, and six weeks later I was accepting credit cards. (My application was delayed because I lived on Pearl Street and wanted an account in the name of the Perl Journal, which they assumed was a typo.)

I toddled into my favorite bookstore with a box of TPJ #2 and naively asked if they wanted to stock the magazine. There I learned about magazine distribution and consignment sales (if a bookstore doesn’t sell a book or magazine, they can send it back to the publisher for a full refund).

With TPJ #4, I decided to go glossy—just the cover at first. This was a big decision; after my initial $3,000 investment, I had only grown the magazine as much as revenue permitted, and a glossy cover meant dipping into my savings again. I realized this was the right choice at the 1997 Usenix technical convention, when an attendee uttered these telling words: “Cool! So it’s a magazine now. I’ll subscribe.”

Sales took off from there, the the next few years saw the magazine taking an increasingly large amount of my time. I would typically stay up all night before my scheduled press date, tweaking fonts, doing last-minute proofreading, and shifting ads around. Since I hated reading magazines with jumps (e.g., “Continued on page 53”), I vowed never to do that, no matter how difficult it made layout.

The magazine grew, and I had offers to translate it into other languages and sell posters of some of the covers. I received moral support, good advice, and marketing agreements with O’Reilly (and to a lesser extent, the Linux Journal). I endured shipping problems in Canada and credit card fraud in the Ukraine, but in spite of the occasional bad apple I enjoyed making personal connections with subscribers, many of whom were surprised to be corresponding directly with the editor-in-chief. I considered branching out into other magazine areas, or even novelties—I sold Magnetic Perl Poetry Kits briefly in 1998. But the all-nighters got old after a while; producing a magazine solo was getting to be too much work, and it was taking time away from my day job as a graduate student. The magazine was growing too fast for me to keep up, and so in 1999 I sold it to EarthWeb, staying on as editor. They [TEXT DELETED ON ADVICE OF COUNSEL]. In December 2000, they suspended TPJ #20 one day before it was to be printed.

In March 2001, I took them to arbitration and got the magazine back, which I lateralled to CMP, publisher of Dr. Dobb’s, Sys Admin, and C/C++ Users Journal. (My day job here at O’Reilly prevents me from returning to that frenzied pace again.) It’s now in good hands, and Perl’s future is looking brighter than ever with the advent of Perl 6.

Computer Science and Perl Programming

When you pursue a computer science degree, you learn about not just computers but computability; not just how to program, but strategies for solving problems and expressing those solutions as algorithms. But what you don’t often learn is “computer science in the wild”—how the lofty abstractions, generalizations, and precepts are implemented in the real world.

Perl is very much a real world language. It’s been taught in middle schools all the way up through graduate programs, but it’s not the best first language for computer science students, partly because it does so much for you, and partly because it’s so expressive that it allows you to program badly. This is exactly what you want if you need to dash off a one-liner to generate a report from the company database in the next minute, but it’s not desirable in a computer science curriculum where purity is valued over expedience.

If you were taking a class on compilers, you’d learn about how programs are turned from source code into binaries. Typically, this is expressed in several phases: lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. And in that class, you’d write a simple compiler for a toy language, perhaps taking a couple of weeks to implement each of these phases. Very clean.

Now consider how Perl parses programs, as described in the article Lexical Analysis. Perl’s semantic analysis affects its lexical analysis, so they occur at the same time. Unclean.

The programming component of my undergraduate computer science education primarily used Scheme, a dialect of LISP. It’s as clean as a language can be, with a mathematical simplicity and elegance. I believe that every freshman should study LISP, and I recommend my undergraduate text: Structure and Interpretation of Computer Programs (MIT Press). Scheme is the perfect instructional language because its syntax is minimal.

Perl, it might be said, has maximal syntax. A few keystrokes can do a lot. One of the notions of Huffman coding (discussed in Compression) is that frequently occurring things should be represented more concisely than infrequently occuring things; that’s why an E in Morse code is a single dot while a Z is dash dash dot dot, and that’s why the function to search and replace strings in Perl is an s while the operator to translate a network protocol number to its corresponding name is getprotobynumber. (Neither of these situations occurred from explicit design, since each operator got its name from already existing libraries. Sometimes good design just evolves naturally out of common usage.) Scalars begin with $, arrays begin with @, and hashes begin with %. Perl’s punctuation holds a great deal of meaning, enabling you to express a lot with a little.

Minimal syntax languages such as Scheme are the best for learning about computer science, and maximal syntax languages such as Perl are the best for getting your job done. This book illuminates some selected corners of computer science with Perl—certainly no substitute for a real computer science book, but a helpful complement, showing you how to apply some of those concepts to get your job done. You’ll learn about high-concept data structures like infinite lists and B-trees, and how to create your own data structures like the Schmidt Hash. You’ll see how generic concepts like a memoizing cache can be implemented, and learn to write your own parsers—not the clean parsers you’d create in the aforementioned class on compilers, but potentially messy parsers that end up being a whole lot more useful.

Most of the articles in this book will teach you some principle that you can apply beyond Perl programming. For some articles, the application is obvious: Client-Server Applications shows how to create your own network service on the Internet; Information Retrieval teaches the basics of information retrieval using Perl; Making Life and Death Decisions with Perl demonstrates the basics of conditional probability. For others, the relationship to computer science is a bit more subtle. Building Software with Cons, for instance, is ostensibly about a replacement for make, but even this pragmatic topic takes an academic twist. How can a build system determine whether a file has been modified since the last build? make looks at the file’s modification date; seemingly sensible, until you realize that clock skew dooms this approach. So Cons computes an MD5 cryptographic signature of each file. Reading the article makes it obvious that this is the right solution, and yet no one ever integrated it into make.

You’ll learn about different programming paradigms. In a computer science curriculum, you’d learn about the advantages of object-oriented programming. You probably wouldn’t learn about the disadvantages of OO, since that’s messy real world stuff, having to do with speed and program maintainability. Here, you’ll see ways to fiddle with Perl’s OO to make it messier or cleaner, whatever suits the application at hand. You’ll see how to insert a “source filter” into your program immediately before Perl begins lexical analysis—again, not something you’ll learn in a compiler class, but interesting if only for the fact that it’s necessary in the real world. In Using Other Languages from Perl, you’ll see how to have your program trigger the compilation of programs in other languages, enabling you to use C or Java or assembly language from Perl. Unclean, but incredibly useful.

As another example of the occasional divergence between clean computer science and the messy real world, you’ll learn about variable scope in Scoping; how it was done imperfectly in Perl 4, and how Perl 5 and 5.6 were able to improve Perl’s scoping behavior while maintaining backward compatibility.

One final example: A class on theoretical computer science will teach you the difference between deterministic and nondeterministic finite automata, two abstractions used to explore computability. In Understanding Regular Expressions, Part I, you’ll learn why that difference directly impacts the speed of regular expressions in different languages—an understanding that enables you to see why a particular Perl regex takes hours to run, while a slight variant takes only a few seconds. That’s why understanding the underlying computer science can help Perl programmers function even better in the real world.

Get Computer Science & Perl Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Computer Science & Perl Programming by Jon Orwant

Chapter 1. Introduction

History of TPJ

Computer Science and Perl Programming

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly