book

Introducing Regular Expressions

Name: Introducing Regular Expressions
Author: Michael Fitzgerald
ISBN: 9781449392680

by Michael Fitzgerald

July 2012

Beginner

151 pages

3h 27m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introducing Regular Expressions
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Preface
Who Should Read This BookWhat You Need to Use This BookConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
1. What Is a Regular Expression?
Getting Started with RegexpalMatching a North American Phone NumberMatching Digits with a Character ClassUsing a Character ShorthandMatching Any CharacterCapturing Groups and Back ReferencesUsing QuantifiersQuoting LiteralsA Sample of ApplicationsWhat You Learned in Chapter 1Technical Notes
2. Simple Pattern Matching
Matching String LiteralsMatching DigitsMatching Non-DigitsMatching Word and Non-Word CharactersMatching WhitespaceMatching Any Character, Once AgainMarking Up the TextUsing sed to Mark Up TextUsing Perl to Mark Up TextWhat You Learned in Chapter 2Technical Notes
3. Boundaries
The Beginning and End of a LineWord and Non-word BoundariesOther AnchorsQuoting a Group of Characters as LiteralsAdding TagsAdding Tags with sedAdding Tags with PerlWhat You Learned in Chapter 3Technical Notes
4. Alternation, Groups, and Backreferences
AlternationSubpatternsCapturing Groups and BackreferencesNamed GroupsNon-Capturing GroupsAtomic GroupsWhat You Learned in Chapter 4Technical Notes
5. Character Classes
Negated Character ClassesUnion and DifferencePOSIX Character ClassesWhat You Learned in Chapter 5Technical Notes
6. Matching Unicode and Other Characters
Matching a Unicode CharacterUsing vimMatching Characters with Octal NumbersMatching Unicode Character PropertiesMatching Control CharactersWhat You Learned in Chapter 6Technical Notes
7. Quantifiers
Greedy, Lazy, and PossessiveMatching with *, +, and ?Matching a Specific Number of TimesLazy QuantifiersPossessive QuantifiersWhat You Learned in Chapter 7Technical Notes

8. Lookarounds
Positive LookaheadsNegative LookaheadsPositive LookbehindsNegative LookbehindsWhat You Learned in Chapter 8Technical Notes
9. Marking Up a Document with HTML
Matching TagsTransforming Plain Text with sedSubstitution with sedHandling Roman Numerals with sedHandling a Specific Paragraph with sedHandling the Lines of the Poem with sedAppending TagsUsing a Command File with sedTransforming Plain Text with PerlHandling Roman Numerals with PerlHandling a Specific Paragraph with PerlHandling the Lines of the Poem with PerlUsing a File of Commands with PerlWhat You Learned in Chapter 9Technical Notes
10. The End of the Beginning
Learning MoreNotable Tools, Implementations, and LibrariesPerlPCRERuby (Oniguruma)PythonRE2Matching a North American Phone NumberMatching an Email AddressWhat You Learned in Chapter 10
A. Regular Expression Reference
Regular Expressions in QEDMetacharactersCharacter ShorthandsWhitespaceUnicode Whitespace CharactersControl CharactersCharacter PropertiesScript Names for Character PropertiesPOSIX Character ClassesOptions/ModifiersASCII Code Chart with RegexTechnical Notes
Regular Expression Glossary
Index
About the Author
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright

Content preview from Introducing Regular Expressions

Chapter 10. The End of the Beginning

“Unix was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.” —Doug Gwyn

Congratulations for making it this far. You’re not a regular expression novice anymore. You have been introduced to the most commonly used regular expression syntax. And it will open a lot of possibilities up to you in your work as a programmer.

Learning regular expressions has saved me a lot of time. Let me give you an example.

I use a lot of XSLT at work, and often I have to analyze the tags that exist in a group of XML files.

I showed you part of this in the last chapter, but here is a long one-liner that takes a list of tag names from lorem.dita and converts it into a simple XSLT stylesheet:

grep -Eo '<[_a-zA-Z][^>]*>' lorem.dita | sort | uniq | sed '1 i\
<xml:stylsheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">\

; s/^</\
<xsl:template match="/;s/ id=\".*\"//;s/>$/">\
 <xsl:apply-templates\/>\
<\/xsl:template>/;$ a\
\
</xsl:stylesheet>\
'

I know this script may appear a bit acrobatic, but after you work with this stuff for a long time, you start thinking like this. I am not even going to explain what I’ve done here, because I am sure you can figure it out on your own now.

Here is what the output looks like:

<xml:stylsheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="body"> <xsl:apply-templates/> </xsl:template> <xsl:template match="li"> <xsl:apply-templates/> ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449338879Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Introducing Regular Expressions

by Michael Fitzgerald

Chapter 10. The End of the Beginning

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.