O'Reilly logo

Introducing Regular Expressions by Michael Fitzgerald

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. Marking Up a Document with HTML

This chapter will take you step by step through the process of marking up plain-text documents with HTML5 using regular expressions, concluding what we started early in the book.

Now, if it were me, I’d use AsciiDoc to do this work. But for our purposes here, we’ll pretend that there is no such thing as AsciiDoc (what a shame). We’ll plod along using a few tools we have at hand—namely, sed and Perl—and our own ingenuity.

For our text we’ll still use Coleridge’s poem in rime.txt.

Note

The scripts in this chapter work well with rime.txt because you understand the structure of that file. These scripts will give you less predictable results when used on arbitrary text files; however, they give you a starting point for handling text structures in more complex files.

Matching Tags

Before we start adding markup to the poem, let’s talk about how to match either HTML or XML tags. There are a variety of ways to match a tag, either start-tags (e.g., <html>) or end-tags (e.g., </html>), but I have found the one that follows to be reliable. It will match start-tags, with or without attributes:

<[_a-zA-Z][^>]*>

Here is what it does:

  • The first character is a left angle bracket (<).

  • Elements can begin with an underscore character (_) in XML or a letter in the ASCII range, in either upper- or lowercase (see Technical Notes).

  • Following the start character, the name can be followed by zero or more characters, any character other than a right angle bracket (>).

  • The expression ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required