Chapter 9. Strings

Big words are always punished.

Sophocles, Antigone (442 B.C.E.)

Perl excels in string matching: the e of Perl, “extraction,” refers to identifying particular chunks of text in documents. In this chapter we describe the difficulties inherent in matching strings, and explore the best known matching algorithms.

There’s more to matching than the regular expressions so dear to every veteran Perl programmer. Approximate matching (also known as fuzzy matching) lets you loosen the all-or-none nature of matching. More specific types of matching often have particular linguistic and structural goals in mind:

  • phonetic matching

  • stemming

  • inflection

  • lexing

  • parsing

In this chapter we will briefly review Perl’s string matching, and then embark on a tour of string matching algorithms, some of which are used internally by Perl while others are encapsulated as Perl modules. Finally, we’ll discuss compression: the art of shrinking data (typically text).

Perl Builtins

We won’t spend much time on the well-known and much-beloved Perl features for string matching. But some of the tips in this section may save you some time on your next global search.

Exact Matching

The best tool in Perl for finding exact strings in another string (scalar) is not the match operator m//, but the much faster index() function. Use it whenever the text you are looking for is straight text. Whenever you don’t need additional metanotation like “at the beginning of the string” or “any character,” use index() ...

Get Mastering Algorithms with Perl now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.