Regular expressions are arrangements of characters that form a pattern that can then be used against strings to find matches, make replacements, or locate specific substrings. Most programming languages support some form of regular expressions, and JavaScript is no exception.
Regular expressions can be created explicitly using the RegExp
object, although you can also create one using a literal,
as was demonstrated with the string literal in the last section. The
following using the explicit option:
var searchPattern = new RegExp('+s');
While the next line of code demonstrates the literal RegExp
option:
var searchPattern = /+s/;
In both cases, the plus sign(+) in the search pattern matches
anything with one or more consecutive s’s in a
string. The forward slashes with the literal, (/+s/
), mark that the object being created is a
regular expression and not some other type of object.
The RegExp
object has only
two unique methods of interest: test
and exec
. The test
method determines whether a string
passed in as a parameter matches with the regular expression. In the
following example, the pattern /JavaScript
rules/
is tested against the string to see
whether a match is found:
var re = /JavaScript rules/; var str = "JavaScript rules"; if (re.test(str)) document.writeln("I guess it does rule") ;
Matches are case-sensitive: if the pattern is instead
/Javascript
rules/
,
the result is false
. To
instruct the pattern-matching functions to ignore case, follow the
second forward slash of the regular expression with the letter
i
:
var re =/Javascript rules/i;
The other flags are g
for a
global match and m
to match over
many lines. If using RegExp
to
generate the regular expression, pass these to the constructor as a
second parameter:
var searchPattern = new RegExp('+s', 'g');
In the following snippet of code, the RegExp
method, exec
, searches for a specific pattern,
/JS*/
, across the entire string
(g
), ignoring case (i
):
var re = /JS*/ig; var str = "cfdsJS *(&YJSjs 888JS"; var resultArray = re.exec(str); while (resultArray) { document.writeln(resultArray[0]); resultArray = re.exec(str); }
The pattern described in the regular expression is the letter
J
, followed by any number of
S
’s. Since the i
flag is used, case is ignored, so the
js
substring is found. As the
g
flag is given, the last index is
set to the location where the last pattern was found on each
successive call, so each call to exec
finds the next pattern. In all, the
four items found are printed out, and when no others are found, a null
value is assigned to the array. This ends the loop.
These code samples have demonstrated a couple of the special regular-expression characters. There are several regular-expression characters, such as the plus sign and asterisk in the previous example.
Typically, books and articles throw all such characters into a table, and then provide a couple of examples where several are used together in a long and complicated pattern, and that’s the extent of the coverage. Because of this, there are many people who have a lot of trouble putting together regular expressions and, as a consequence, their applications don’t work as they originally anticipated. I think that regular expressions are important enough to at least provide several examples, from simple to complex. If you have worked with regular expressions before, you might want to skip this section—unless you need the review.
Though the RegExp
methods are
used in applications, regular expressions and the RegExp
object are used primarily with the
String
object’s regex
methods: replace
, match
, and search
. The rest of the examples in this
section demonstrate regular expressions using these methods.
The first character is the backslash (\), usually called the escape character, because it’s used to escape whatever character follows. In JavaScript regular expressions, this results in two behaviors. If the character is usually treated literally, such as the letter s, it’s treated as a special character following the escape character—in this case, a whitespace (space, tab, form feed, line feed). If the backslash is used with a special character, such as the plus sign earlier, the character is treated as a literal.
Example 4-5 searches for instances of a space that’s followed by an asterisk, and replaces them with a dash. Normally, the asterisk is used to match zero or more of the preceding characters in a regular expression, but in this case, we want to treat it as a literal.
Example 4-5. Escape character in regular expressions
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>The Backslash in RegExp</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <script type="text/javascript"> //<![CDATA[ var regExp = /\s\*/g; var str = "This *is *a *test *string"; var resultString = str.replace(regExp,'-'); document.writeln(resultString); //]]> </script> </body> </html>
The result of applying the regular expression against the string is the following line:
This-is-a-test-string
This is a very handy expression to keep in mind. If you want to
replace all occurrences of spaces in a string with dashes, regardless
of what’s following the spaces, use the following pattern: /\\s/g
in the replace
method, passing in the hyphen as the
replacement character.
Four of the regular-expression characters are used to match specific occurrences of characters: the asterisk (*) matches the character preceding it zero or more times, the plus/addition sign (+) matches the character preceding it one or more times, and the question mark (?) matches zero or one of the preceding characters. The dot (.) matches exactly one character.
Warning
Two patterns of interest are the greedy match (.*) and the
lazy star (.*?). In the first, since a period can represent any
character, the asterisk matches until the last occurrence of a
pattern, rather than the first. If you’re looking for anything
within quotes, you might think of using /".*"/
. If you use this with a string,
such as:
test="one" or this is also a "test"
The match begins with the first double-quote and continues until the last one, not the second:
"one" or this is also a "test"
The lazy star forces the match to end on the second occurrence of the double quote, rather than the last:
"one"
In Example
4-6, the String
search
method looks for a date in the format of month name followed by space,
day of month, and then year. The date begins after a colon.
Example 4-6. Patterns of repeating characters
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>Find Date</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <script type="text/javascript"> //<![CDATA[ var regExp = /:\D*\s\d+\s\d+/; var str = "This is a date: March 12 2005"; var resultString = str.match(regExp); document.writeln("Date" + resultString); //]]> </script> </body> </html>
Looking more closely at the regular expression, the first
character in the pattern is the colon, followed by the backslash with
a capital letter D: \\D
. This
sequence is one way of looking for any nondigit character; the
asterisk following means that any number of nondigit characters will
match. The next part in the regular expression is a whitespace
character \\s
, followed by another
new pattern: \\d
. Unlike the
earlier sequence, \\D
, the
lowercase letter means to match numbers only. The plus sign following
it means one or more numbers. Another space follows \\s
in the pattern and then another sequence
of numbers \\d+
.
When matched against the string using the String
match method, the date preceded by
the colon is found, returned, and printed out:
Date: March 12 2005
In the example, \D
matches
any nonnumber character. Another way to create this particular match
is to use the square brackets with a number range, preceded by the
caret character (^). If you want to match any character but numbers,
use the following:
[^0-9]
The same holds true for \d
,
except now you want numbers, so leave off the caret:
[0-9]
If you wish to match on more than one character type, you can list each range of characters within the brackets. The following matches on any upper- or lowercase letters:
[A-Za-z]
Using these, the regular expression in Example 4-6 could also be given as:
var regExp = /:[^0-9]*\s[0-9]+\s[0-9]+/;
The caret is used in another pattern: it and the dollar sign are used to capture specific patterns relative to the beginning and end of a line. The caret, outside of brackets, matches any sequence beginning a line; the dollar sign matches any ending a line.
In the following code snippet, the match is not successful because the character searched did not occur at the beginning of the line:
var regExp = /^The/i; var str = "This is the JavaScript example";
However, the following would be successful:
var regExp = /^The/i; var str = "The example";
If the multiple line flag is given (m
), the caret matches on the first character
after the line break:
var regExp = /^The/im; var str = "This is\nthe end";
The same positional pattern matching holds true for the end-of-line character. The following doesn’t match:
var regExp = /end$/; var str = "The end is near";
But this does:
var regExp = /end$/; var str = "The end";
If the multiple line flag is used, it matches at the end of the string and just before the line break:
var regExp = /The$/im; var str = "This is really the\nend";
The use of parentheses is significant in regular-expression pattern matching. Parentheses match and then remember the match. The remembered values are stored in the result array:
var rgExp = /(^\D*[0-9])/ var str = "This is fun 01 stuff"; var resultArray = str.match(rgExp); document.writeln(resultArray);
With this example, the array prints out This is fun 0
twice, separated by a comma
indicating two array entries. The first result is the match; the
second, the stored value from the parentheses. If, instead of
surrounding the entire pattern, you surround only a portion, such as
/(^\\D*)[0-9]/
, this
results:
This is fun 0,This is fun
Only the surrounded matched string is stored.
Parentheses can also help switch material around in a string.
RegExp
has special characters,
labeled $1
, $2
, and so on to $9
, that store substrings discovered through
the use of the capturing parentheses. Example 4-7 finds
pairs of strings separated by one or more dashes and switches the
order of the strings.
Example 4-7. Swapping Strings using regular expressions
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <title>Regular Expression Switch</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <script type="text/javascript"> //<![CDATA[ var rgExp = /(\w*)-*(\w*)/ var str = "Java--Script"; var resultStrng = str.replace(rgExp,"$2-$1"); document.writeln(resultStrng); //]]> </script> </body> </html>
Here’s the end result of this JavaScript:
Script-Java
Notice that the number of dashes is also stripped down to just
one dash. This example also introduces another very popular pattern
matching character sequence, \\w
.
This sequence matches any alphanumeric character, including the
underscore (underline). It’s equivalent to [A-Za-z0-9_]
. Its converse is \\W
, which is equivalent to any nonword
character.
The last regular expression characters we’ll examine in detail
are the vertical bar (|) and curly braces. The vertical bar indicates
optional matches. For instance, the following matches to either the
letter a
or the letter b
:
a|b
You can use more than one character with vertical bars to provide more options:
a|b|c
The curly braces indicate repetition of the preceding character
a set number of times. In the following, the pattern searched is two
s
characters together:
s{2}
Regular expressions are extremely useful when validating form contents, as demonstrated in Chapter 7.
Get Learning JavaScript now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.