Regular Expressions
Regular expressions are a powerful language for describing and manipulating text. A regular expression is applied to a string—that is, to a set of characters. Often that string is an entire text document.
The result of applying a regular expression to a string is either to return a substring, or to return a new string representing a modification of some part of the original string. Remember that strings are immutable and so cannot be changed by the regular expression.
By applying a properly constructed regular expression to the following string:
One,Two,Three Liberty Associates, Inc.
you can return any or all of its substrings (e.g.,
Liberty
or One
), or modified
versions of its substrings (e.g., LIBeRtY
or
OnE
). What the regular expression
does is determined by the syntax of the regular
expression itself.
A regular expression consists of two types of characters:
literals
and
metacharacters
.
A literal is just a character you wish to match in the target string.
A metacharacter is a special symbol which acts as a command to the
regular expression parser. The parser is the engine responsible for
understanding the regular expression. For example, if you create a
regular expression:
^(From|To|Subject|Date):
this will match any substring with the letters
"From
" or the letters
"To
" or the letters
"Subject
" or the letters
"Date
" so long as those letters start
a new line (^
) and end with
a
colon
(:
).
The
carrot
(^)
in this case indicates to the regular expression parser ...
Get Programming C# now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.