Mastering Algorithms with Perl

Preface

Perl’s popularity has soared in recent years. It owes its appeal first to its technical superiority: Perl’s unparalleled portability, speed, and expressiveness have made it the language of choice for a million programmers worldwide.

Those programmers have extended Perl in ways unimaginable with languages controlled by committees or companies. Of all languages, Perl has the largest base of free utilities, thanks to the Comprehensive Perl Archive Network (abbreviated CPAN; see http://www.perl.com/CPAN/).The modules and scripts you’ll find there have made Perl the most popular language for web, text, and database programming.

But Perl can do more than that. You can solve complex problems in Perl more quickly, and in fewer lines, than in any other language.

This ease of use makes Perl an excellent tool for exploring algorithms. Computer science embraces complexity; the essence of programming is the clean dissection of a seemingly insurmountable problem into a series of simple, computable steps. Perl is ideal for tackling the tougher nuggets of computer science because its liberal syntax lets the programmer express his or her solution in the manner best suited to the task. (After all, Perl’s motto is There’s More Than One Way To Do It.) Algorithms are complex enough; we don’t need a computer language making it any tougher.

Most books about computer algorithms don’t include working programs. They express their ideas in quasi-English pseudocode instead, which allows the discussion to focus on concepts without getting bogged down in implementation details. But sometimes the details are what matter—the inefficiencies of a bad implementation sometimes cancel the speedup that a good algorithm provides. The devil is in the details.

And while converting ideas to programs is often a good exercise, it’s also just plain time-consuming. So, in this book we’ve supplied you with not just explanations, but implementations as well. If you read this book carefully, you’ll learn more about both algorithms and Perl.

About This Book

This book is written for two kinds of people: those who want cut and paste solutions and those who want to hone their programming skills. You’ll see how we solve some of the classic problems of computer science and why we solved them the way we did.

Theory or Practice?

Like the wolf featured on the cover, this book is sometimes fierce and sometimes playful. The fierce part is the computer science: we’ll often talk like computer scientists talk and discuss problems that matter little to the practical Perl programmer. Other times, we’ll playfully explain the problem and simply tell you about ready-made solutions you can find on the Internet (almost always on CPAN).

Deciding when to be fierce and when to be playful hasn’t been easy for us. For instance, every algorithms textbook has a chapter on all of the different ways to sort a collection of items. So do we, even though Perl provides its own sort() function that might be all you ever need. We do this for four reasons. First, we don’t want you thinking you’ve Mastered Algorithms without understanding the algorithms covered in every college course on the subject. Second, the concepts, processes, and strategies underlying those algorithms will come in handy for more than just sorting. Third, it helps to know how Perl’s sort() works under the hood, why its particular algorithm (quicksort) was used, and how to avoid some of the inefficiencies that even experienced Perl programmers fall prey to. Finally, sort() isn’t always the best solution! Someday, you might need another of the techniques we provide.

When it comes to the inevitable tradeoffs between theory and practice, programmers’ tastes vary. We have chosen a middle course, swiftly pouncing from one to the other with feral abandon. If your tastes are exclusively theoretical or practical, we hope you’ll still appreciate the balanced diet you’ll find here.

Organization of This Book

The chapters in this book can be read in isolation; they typically don’t require knowledge from previous chapters. However, we do recommend that you read at least Chapter 1, and Chapter 2, which provide the basic material necessary for understanding the rest of the book.

Chapter 1 describes the basics of Perl and algorithms, with an emphasis on speed and general problem-solving techniques.

Chapter 2 explains how to use Perl to create simple and very general representations, like queues and lists of lists.

Chapter 3 shows how to build the classic computer science data structures.

Chapter 4 looks at techniques for ordering data and compares the advantages of each technique.

Chapter 5 investigates ways to extract individual pieces of information from a larger collection.

Chapter 6 discusses the basics of set theory and Perl implementations of set operations.

Chapter 7 examines techniques for manipulating large arrays of data and solving problems in linear algebra.

Chapter 8 describes tools for solving problems that are best represented as a graph: a collection of nodes connected by edges.

Chapter 9 explains how to implement algorithms for searching, filtering, and parsing strings of text.

Chapter 10 looks at techniques for computing with two- and three-dimensional constructs.

Chapter 11 investigates methods for generating important constants, functions, and number series, as well as manipulating numbers in alternate coordinate systems.

Chapter 12 examines algorithms for factoring numbers, modular arithmetic, and other techniques for computing with integers.

Chapter 13 demonstrates Perl utilities to conceal your data from prying eyes.

Chapter 14 discusses how to use Perl for problems involving chance.

Chapter 15 describes methods for analyzing the accuracy of hypotheses and characterizing the distribution of data.

Chapter 16 looks at a few of the more common problems in scientific computing.

Appendix A contains an annotated bibliography.

Appendix B lists the seven-bit ASCII character set used by default when Perl sorts strings.

Conventions Used in This Book

Italic: Used for filenames, directory names, URLs, and occasional emphasis.
Constant width: Used for elements of programming languages, text manipulated by programs, code examples, and output.
Constant width bold: Used for user input and for emphasis in code.
Constant width italic: Used for replaceable values.

What You Should Know Before Reading This Book

Algorithms are typically the subject of an entire upper-level undergraduate course in computer science departments. Obviously, we cannot hope to provide all of the mathematical and programming background you’ll need to get the most out of this book. We believe that the best way to teach is never to coddle, but to explain complex concepts in an entertaining fashion and thoroughly ground them in applications whenever possible. You don’t need to be a computer scientist to read this book, but once you’ve read it you might feel justified calling yourself one.

That said, if you don’t know Perl, you don’t want to start here. We recommend you begin with either of these books published by O’Reilly & Associates: Randal L. Schwartz and Tom Christiansen’s Learning Perl if you’re new to programming, and Larry Wall, Tom Christiansen, and Randal L. Schwartz’s Programming Perl if you’re not.

If you want more rigorous explanations of the algorithms discussed in this book, we recommend either Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest’s Introduction to Algorithms, published by MIT Press, or Donald Knuth’s The Art of Computer Programming, Volume 1 (Fundamental Algorithms) in particular. See Appendix A for full bibliographic information.

What You Should Have Before Reading This Book

This book assumes you have Perl 5.004 or better. If you don’t, you can download it for free from http://www.perl.com/CPAN/src.

This book often refers to CPAN modules, which are packages of Perl code you can download for free from http://www.perl.com/CPAN/modules/by-module/.In particular, the CPAN.pm module (http://www.perl.com/CPAN/modules/by-module/CPAN)can automatically download, build, and install CPAN modules for you.

Typically, the modules in CPAN are usually quite robust because they’re tested and used by large user populations. You can check the Modules List (reachable by a link from http://www.perl.com/CPAN/CPAN.html)to see how authors rate their modules; as a module rating moves through “idea,” “under construction,” “alpha,” “beta,” and finally to “Released,” there is an increasing likelihood that it will behave properly.

Online Information About This Book

All of the programs in this book are available online from ftp://ftp.oreilly.com/,in the directory /pub/examples/perl/algorithms/examples.tar.gz. If we learn of any errors in this book, you’ll be able to find them at /pub/examples/perl/algorithms/errata.txt.

Acknowledgments

Jon Orwant: I would like to thank all of the biological and computational entities that have made this book possible. At the Media Laboratory, Walter Bender has somehow managed to look the other way for twelve years while my distractions got the better of me. Various past and present Media Labbers helped shape this book, knowingly or not: Nathan Abramson, Amy Bruckman, Bill Butera, Pascal Chesnais, Judith Donath, Klee Dienes, Roger Kermode, Doug Koen, Michelle Mcdonald, Chris Metcalfe, Warren Sack, Sunil Vemuri, and Chris Verplaetse. The Miracle Crew helped in ways intangible, so thanks to Alan Blount, Richard Christie, Diego Garcia, Carolyn Grantham, and Kyle Pope.

When Media Lab research didn’t steal time from algorithms, The Perl Journal did, and so I’d like to thank the people who helped ease the burden of running the magazine: Graham Barr, David Blank-Edelman, Alan Blount, Sean M. Burke, Mark-Jason Dominus, Brian D. Foy, Jeffrey Friedl, Felix Gallo, Kevin Lenzo, Steve Lidie, Tuomas J. Lukka, Chris Nandor, Sara Ontiveros, Tim O’Reilly, Randy Ray, John Redford, Chip Salzenberg, Gurusamy Sarathy, Lincoln D. Stein, Mike Stok, and all of the other contributors. Fellow philologist Tom Christiansen helped birth the magazine, fellow sushi-lover Sara Ontiveros helped make operations bearable, and fellow propagandist Nathan Torkington soon became indispensable.

Sandy Aronson, Francesca Pardo, Kim Scearce, and my parents, Jack and Carol, have all tolerated and occasionally even encouraged my addiction to the computational arts. Finally, Alan Blount and Nathan Torkington remain strikingly kindred spirits, and Robin Lucas has been a continuous source of comfort and joy.

Jarkko, John, and I would like to thank our team of technical reviewers: Tom Christiansen, Damian Conway, Mark-Jason Dominus, Daniel Dreilinger, Dan Gruhl, Andi Karrer, Mike Stok, Jeff Sumler, Sekhar Tatikonda, Nathan Torkington, and the enigmatic Abigail. Their boundless expertise made this book substantially better. Abigail, Mark-Jason, Nathan, Tom, and Damian went above and beyond the call of duty.

We would also like to thank the talented staff at O’Reilly for making this book possible, and for their support of Perl in general. Andy Oram prodded us just the right amount, and his acute editorial eye helped the book in countless ways. Melanie Wang, our production editor, paid unbelievably exquisite attention to the tiniest details; Rhon Porter and Rob Romano made our illustrations crisp and clean; and Lenny Muellner coped with our SGML.

As an editor and publisher, I’ve learned (usually the hard way) about the difficulties of editing and disseminating Perl content. Having written a Perl book with another publisher, I’ve learned how badly some of the publishing roles can be performed. And I quite simply cannot envision a better collection of talent than the folks at O’Reilly. So in addition to the people who worked on our book, I’d personally like to thank Gina Blaber, Mark Brokering, Mark Jacobsen, Lisa Mann, Linda Mui, Tim O’Reilly, Madeleine Schnapp, Ellen Silver, Lisa Sloan, Linda Walsh, Frank Willison, and all the other people I’ve had the pleasure of working with at O’Reilly & Associates. Keep up the good work. Finally, we would all like to thank Larry Wall and the rest of the Perl community for making the language as fun as it is.

Jarkko Hietaniemi: I want to thank my parents for their guidance, which led me to become so hopelessly interested in so many things, including algorithms and Perl. My little sister I want to thank for being herself. Nokia Research Center I need to thank for allowing me to write this book even though it took much longer than originally planned. My friends and colleagues I must thank for goading me on by constantly asking how the book was doing.

John Macdonald: First and foremost, I want to thank my wife, Chris. Her love, support, and assistance was unflagging, even when the “one year offline” to write the book continued to extend through the entirety of her “one year offline” to pursue further studies at university. An additional special mention goes to Ailsa for many weekends of child-sitting while both parents were offline. Much thanks to Elegant Communications for providing access to significant amounts of computer resources, many dead trees, and much general assistance. Thanks to Bill Mustard for the two-year loan of a portion of his library and for acting as a sounding board on numerous occasions. I’ve also received a great deal of support and encouragement from many other family members, friends, and co-workers (these groups overlap).

Comments and Questions

Please address comments and questions concerning this book to the publisher:

O'Reilly & Associates, Inc.
101 Morris Street
Sebastopol, CA 95472
800-998-9938 (in the U.S. or Canada)
707-829-0515 (international/local)
707-829-0104 (FAX)

You can also send us messages electronically. To be put on our mailing list or to request a catalog, send email to:

info@oreilly.com

To ask technical questions or comment on the book, send email to: