Big words are always punished.
Perl excels in string matching: the e of Perl, “extraction,” refers to identifying particular chunks of text in documents. In this chapter we describe the difficulties inherent in matching strings, and explore the best known matching algorithms.
There’s more to matching than the regular expressions so dear to every veteran Perl programmer. Approximate matching (also known as fuzzy matching) lets you loosen the all-or-none nature of matching. More specific types of matching often have particular linguistic and structural goals in mind:
In this chapter we will briefly review Perl’s string matching, and then embark on a tour of string matching algorithms, some of which are used internally by Perl while others are encapsulated as Perl modules. Finally, we’ll discuss compression: the art of shrinking data (typically text).
We won’t spend much time on the well-known and much-beloved Perl features for string matching. But some of the tips in this section may save you some time on your next global search.
The best tool in Perl for finding exact strings in another
string (scalar) is not the match operator
but the much faster
index() function. Use it
whenever the text you are looking for is straight text. Whenever you
don’t need additional metanotation like “at the beginning of the
string” or “any character,” use