book

Introducing Regular Expressions

Name: Introducing Regular Expressions
Author: Michael Fitzgerald
ISBN: 9781449392680

by Michael Fitzgerald

July 2012

Beginner

151 pages

3h 27m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introducing Regular Expressions
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Preface
Who Should Read This BookWhat You Need to Use This BookConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
1. What Is a Regular Expression?
Getting Started with RegexpalMatching a North American Phone NumberMatching Digits with a Character ClassUsing a Character ShorthandMatching Any CharacterCapturing Groups and Back ReferencesUsing QuantifiersQuoting LiteralsA Sample of ApplicationsWhat You Learned in Chapter 1Technical Notes
2. Simple Pattern Matching
Matching String LiteralsMatching DigitsMatching Non-DigitsMatching Word and Non-Word CharactersMatching WhitespaceMatching Any Character, Once AgainMarking Up the TextUsing sed to Mark Up TextUsing Perl to Mark Up TextWhat You Learned in Chapter 2Technical Notes
3. Boundaries
The Beginning and End of a LineWord and Non-word BoundariesOther AnchorsQuoting a Group of Characters as LiteralsAdding TagsAdding Tags with sedAdding Tags with PerlWhat You Learned in Chapter 3Technical Notes
4. Alternation, Groups, and Backreferences
AlternationSubpatternsCapturing Groups and BackreferencesNamed GroupsNon-Capturing GroupsAtomic GroupsWhat You Learned in Chapter 4Technical Notes
5. Character Classes
Negated Character ClassesUnion and DifferencePOSIX Character ClassesWhat You Learned in Chapter 5Technical Notes
6. Matching Unicode and Other Characters
Matching a Unicode CharacterUsing vimMatching Characters with Octal NumbersMatching Unicode Character PropertiesMatching Control CharactersWhat You Learned in Chapter 6Technical Notes
7. Quantifiers
Greedy, Lazy, and PossessiveMatching with *, +, and ?Matching a Specific Number of TimesLazy QuantifiersPossessive QuantifiersWhat You Learned in Chapter 7Technical Notes

8. Lookarounds
Positive LookaheadsNegative LookaheadsPositive LookbehindsNegative LookbehindsWhat You Learned in Chapter 8Technical Notes
9. Marking Up a Document with HTML
Matching TagsTransforming Plain Text with sedSubstitution with sedHandling Roman Numerals with sedHandling a Specific Paragraph with sedHandling the Lines of the Poem with sedAppending TagsUsing a Command File with sedTransforming Plain Text with PerlHandling Roman Numerals with PerlHandling a Specific Paragraph with PerlHandling the Lines of the Poem with PerlUsing a File of Commands with PerlWhat You Learned in Chapter 9Technical Notes
10. The End of the Beginning
Learning MoreNotable Tools, Implementations, and LibrariesPerlPCRERuby (Oniguruma)PythonRE2Matching a North American Phone NumberMatching an Email AddressWhat You Learned in Chapter 10
A. Regular Expression Reference
Regular Expressions in QEDMetacharactersCharacter ShorthandsWhitespaceUnicode Whitespace CharactersControl CharactersCharacter PropertiesScript Names for Character PropertiesPOSIX Character ClassesOptions/ModifiersASCII Code Chart with RegexTechnical Notes
Regular Expression Glossary
Index
About the Author
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright

Content preview from Introducing Regular Expressions

Chapter 9. Marking Up a Document with HTML

This chapter will take you step by step through the process of marking up plain-text documents with HTML5 using regular expressions, concluding what we started early in the book.

Now, if it were me, I’d use AsciiDoc to do this work. But for our purposes here, we’ll pretend that there is no such thing as AsciiDoc (what a shame). We’ll plod along using a few tools we have at hand—namely, sed and Perl—and our own ingenuity.

For our text we’ll still use Coleridge’s poem in rime.txt.

Note

The scripts in this chapter work well with rime.txt because you understand the structure of that file. These scripts will give you less predictable results when used on arbitrary text files; however, they give you a starting point for handling text structures in more complex files.

Matching Tags

Before we start adding markup to the poem, let’s talk about how to match either HTML or XML tags. There are a variety of ways to match a tag, either start-tags (e.g., <html>) or end-tags (e.g., </html>), but I have found the one that follows to be reliable. It will match start-tags, with or without attributes:

<[_a-zA-Z][^>]*>

Here is what it does:

The first character is a left angle bracket (<).
Elements can begin with an underscore character (_) in XML or a letter in the ASCII range, in either upper- or lowercase (see Technical Notes).
Following the start character, the name can be followed by zero or more characters, any character other than a right angle bracket (>).
The expression ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449338879Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design