Chapter 16. String Searching
The problem of finding one string within another comes up quite often: Searching through files on disk, DNA searches, and even Google rely on strategies for efficiently searching through text. If you've ever used a word processor or text editor or even the editor used for writing code, you have at some stage or another performed a string search. You may know it as the Find
function.
There are many string searching algorithms — and no doubt many more will be discovered over time — each with its own optimizations for handling specific types of data. Some algorithms work better for plain text, while others work better for text and/or patterns containing a lot of repetition, such as DNA fragments.
This chapter covers two algorithms for plain-text searching. We start with an obvious brute-force algorithm and move on to the more sophisticated Boyer-Moore. Each is described in detail, and then you will see how a relatively simple twist on the brute-force approach enables the Boyer-Moore algorithm to perform significantly faster.
After reading this chapter you should be able to do the following:
Describe and implement a brute-force string searching algorithm
Describe and implement the Boyer-Moore string searching algorithm
Understand the performance characteristics of each algorithm
Describe and implement a generic string match iterator
Describe and implement a simple file searching application
A Generic String Searcher Interface
Because we want to be able to implement ...
Get Beginning Algorithms now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.