5Practical Regex Techniques

Now that we’ve covered the basic mechanics of writing regular expressions, I’d like to put that understanding to work in handling situations more complex than those in earlier chapters. Every regex strikes a balance between matching what you want, but not matching what you don’t want. We’ve already seen plenty of examples where greediness can be your friend if used skillfully, and how it can lead to pitfalls if you’re not careful, and we’ll see plenty more in this chapter.

For an NFA engine, another part of the balance, discussed primarily in the next chapter, is efficiency. A poorly designed regex—even one that would otherwise be considered correct—can cripple an engine.

This chapter is comprised mostly of examples, as I lead you through my thought processes in solving a number of problems. I encourage you to read through them even if a particular example seems to offer nothing toward your immediate needs.

For instance, even if you don’t work with HTML, I encourage you to absorb the examples that deal with HTML. This is because writing a good regular expression is more than a skill—it’s an art. One doesn’t teach or learn this art with lists or rules, but rather, through experience, so I’ve written these examples to illustrate for you some of the insight that experience has given me over the years.

You’ll still need your own experience to internalize that insight, but spending time with the examples in this chapter is a good first step.

Regex Balancing ...

Get Mastering Regular Expressions, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.