book

Introducing Regular Expressions

Name: Introducing Regular Expressions
Author: Michael Fitzgerald
ISBN: 9781449392680

by Michael Fitzgerald

July 2012

Beginner

151 pages

3h 27m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introducing Regular Expressions
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Preface
Who Should Read This BookWhat You Need to Use This BookConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
1. What Is a Regular Expression?
Getting Started with RegexpalMatching a North American Phone NumberMatching Digits with a Character ClassUsing a Character ShorthandMatching Any CharacterCapturing Groups and Back ReferencesUsing QuantifiersQuoting LiteralsA Sample of ApplicationsWhat You Learned in Chapter 1Technical Notes
2. Simple Pattern Matching
Matching String LiteralsMatching DigitsMatching Non-DigitsMatching Word and Non-Word CharactersMatching WhitespaceMatching Any Character, Once AgainMarking Up the TextUsing sed to Mark Up TextUsing Perl to Mark Up TextWhat You Learned in Chapter 2Technical Notes
3. Boundaries
The Beginning and End of a LineWord and Non-word BoundariesOther AnchorsQuoting a Group of Characters as LiteralsAdding TagsAdding Tags with sedAdding Tags with PerlWhat You Learned in Chapter 3Technical Notes
4. Alternation, Groups, and Backreferences
AlternationSubpatternsCapturing Groups and BackreferencesNamed GroupsNon-Capturing GroupsAtomic GroupsWhat You Learned in Chapter 4Technical Notes
5. Character Classes
Negated Character ClassesUnion and DifferencePOSIX Character ClassesWhat You Learned in Chapter 5Technical Notes
6. Matching Unicode and Other Characters
Matching a Unicode CharacterUsing vimMatching Characters with Octal NumbersMatching Unicode Character PropertiesMatching Control CharactersWhat You Learned in Chapter 6Technical Notes
7. Quantifiers
Greedy, Lazy, and PossessiveMatching with *, +, and ?Matching a Specific Number of TimesLazy QuantifiersPossessive QuantifiersWhat You Learned in Chapter 7Technical Notes

8. Lookarounds
Positive LookaheadsNegative LookaheadsPositive LookbehindsNegative LookbehindsWhat You Learned in Chapter 8Technical Notes
9. Marking Up a Document with HTML
Matching TagsTransforming Plain Text with sedSubstitution with sedHandling Roman Numerals with sedHandling a Specific Paragraph with sedHandling the Lines of the Poem with sedAppending TagsUsing a Command File with sedTransforming Plain Text with PerlHandling Roman Numerals with PerlHandling a Specific Paragraph with PerlHandling the Lines of the Poem with PerlUsing a File of Commands with PerlWhat You Learned in Chapter 9Technical Notes
10. The End of the Beginning
Learning MoreNotable Tools, Implementations, and LibrariesPerlPCRERuby (Oniguruma)PythonRE2Matching a North American Phone NumberMatching an Email AddressWhat You Learned in Chapter 10
A. Regular Expression Reference
Regular Expressions in QEDMetacharactersCharacter ShorthandsWhitespaceUnicode Whitespace CharactersControl CharactersCharacter PropertiesScript Names for Character PropertiesPOSIX Character ClassesOptions/ModifiersASCII Code Chart with RegexTechnical Notes
Regular Expression Glossary
Index
About the Author
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright

Content preview from Introducing Regular Expressions

Chapter 1. What Is a Regular Expression?

Regular expressions are specially encoded text strings used as patterns for matching sets of strings. They began to emerge in the 1940s as a way to describe regular languages, but they really began to show up in the programming world during the 1970s. The first place I could find them showing up was in the QED text editor written by Ken Thompson.

“A regular expression is a pattern which specifies a set of strings of characters; it is said to match certain strings.” —Ken Thompson

Regular expressions later became an important part of the tool suite that emerged from the Unix operating system—the ed, sed and vi (vim) editors, grep, AWK, among others. But the ways in which regular expressions were implemented were not always so regular.

Note

This book takes an inductive approach; in other words, it moves from the specific to the general. So rather than an example after a treatise, you will often get the example first and then a short treatise following that. It’s a learn-by-doing book.

Regular expressions have a reputation for being gnarly, but that all depends on how you approach them. There is a natural progression from something as simple as this:

\d

a character shorthand that matches any digit from 0 to 9, to something a bit more complicated, like:

^(\(\d{3}\)|^\d{3}[.-]?)?\d{3}[.-]?\d{4}$

which is where we’ll wind up at the end of this chapter: a fairly robust regular expression that matches a 10-digit, North American telephone number, with or without ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449338879Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Introducing Regular Expressions

by Michael Fitzgerald

Chapter 1. What Is a Regular Expression?

Note

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.