BUY THIS BOOK
Add to Cart

Print Book $9.95


Add to Cart

PDF $7.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £6.95

What is this?

Looking to Reprint or License this content?


Oracle Regular Expressions Pocket Reference
Oracle Regular Expressions Pocket Reference

By Jonathan Gennick, Peter Linsley
Book Price: $9.95 USD
£6.95 GBP
PDF Price: $7.99

Cover | Table of Contents


Table of Contents

Chapter 1: Oracle Regular Expressions Pocket Reference
With the release of Oracle Database 10g, Oracle has introduced regular expression support to the company's flagship product. Regular expressions are used to describe patterns in text, and they are an invaluable aid when working with loosely formatted textual data.
This little booklet describes Oracle's regular expression support in detail. Its goal is to enable you to take full advantage of the newly introduced regular expression features when querying and manipulating textual data.
This book is divided into the following six sections:
Introduction
You're reading it now.
Tutorial
Provides a short regular expression tutorial aimed at those who aren't already familiar with regular expressions.
Oracle's Regular Expression Support
For readers familiar with regular expressions, describes how they are implemented and used within Oracle. Also includes a description of the key differences between the regular expression implementations of Perl and Oracle.
Regular Expression Quick Reference
Describes the regular expression metacharacters supported by Oracle and provides examples of their usage.
Oracle Regular Expression Functions
Details the new SQL and PL/SQL functions that make up Oracle's regular expression support.
Oracle Regular Expression Error Messages
Lists all of Oracle's regular expression error messages and provides advice as to what do when you encounter a given message.
The following typographical conventions are used in this book:
UPPERCASE
Indicates a SQL or PL/SQL keyword
lowercase
Indicates a user-defined item, such as a table name or a column name, in a SQL or PL/SQL statement
Italic
Indicates URLs, emphasis, or the introduction of new technical terms
Constant width
Used for code examples and for in-text references to table names, column names, regular expressions, and so forth
Constant width bold
Indicates user input in code examples showing both input and output
We thank Debby Russell and Todd Mezzulo of O'Reilly & Associates for believing in and supporting this book. We also thank Barry Trute, Michael Yau, Weiran Zhang, Keni Matsuda, Ken Jacobs, and the others at Oracle Corporation who spent valuable time reviewing this manuscript to ensure its accuracy.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction
With the release of Oracle Database 10g, Oracle has introduced regular expression support to the company's flagship product. Regular expressions are used to describe patterns in text, and they are an invaluable aid when working with loosely formatted textual data.
This little booklet describes Oracle's regular expression support in detail. Its goal is to enable you to take full advantage of the newly introduced regular expression features when querying and manipulating textual data.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Organization of This Book
This book is divided into the following six sections:
Introduction
You're reading it now.
Tutorial
Provides a short regular expression tutorial aimed at those who aren't already familiar with regular expressions.
Oracle's Regular Expression Support
For readers familiar with regular expressions, describes how they are implemented and used within Oracle. Also includes a description of the key differences between the regular expression implementations of Perl and Oracle.
Regular Expression Quick Reference
Describes the regular expression metacharacters supported by Oracle and provides examples of their usage.
Oracle Regular Expression Functions
Details the new SQL and PL/SQL functions that make up Oracle's regular expression support.
Oracle Regular Expression Error Messages
Lists all of Oracle's regular expression error messages and provides advice as to what do when you encounter a given message.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conventions
The following typographical conventions are used in this book:
UPPERCASE
Indicates a SQL or PL/SQL keyword
lowercase
Indicates a user-defined item, such as a table name or a column name, in a SQL or PL/SQL statement
Italic
Indicates URLs, emphasis, or the introduction of new technical terms
Constant width
Used for code examples and for in-text references to table names, column names, regular expressions, and so forth
Constant width bold
Indicates user input in code examples showing both input and output
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Acknowledgments
We thank Debby Russell and Todd Mezzulo of O'Reilly & Associates for believing in and supporting this book. We also thank Barry Trute, Michael Yau, Weiran Zhang, Keni Matsuda, Ken Jacobs, and the others at Oracle Corporation who spent valuable time reviewing this manuscript to ensure its accuracy.
Peter would like to acknowledge Weiran Zhang for his finesse and intellect as codeveloper of Oracle's regular expression features. Peter would also like to thank Ritsu for being an ever-supportive and encouraging wife.
Jonathan would like to thank Dale Bowen for providing the Spanish sentence used for the collation example; Andrew Sears for spending so much time with Jeff; Jeff for dragging his dad on so many bike rides to the Falling Rock Cafe for ice cream and coffee; and the Falling Rock Cafe for, well, just for being there.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Example Data
Many of the example SQL statements in this book execute against the following table:
CREATE TABLE park (
    park_name NVARCHAR2 (40),
    park_phone NVARCHAR2 (15),
    country VARCHAR2 (2),
    description NCLOB
);
This table contains information on a variety of state, provincial, and national parks from around the world. Much of the information is in free-text form within the description column, making this table an ideal platform on which to demonstrate Oracle's regular expression capabilities.
You can download a script to create the park table and populate it with data from http://oreilly.com/catalog/oracleregexpr.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tutorial
A regular expression (often known as a regex) is a sequence of characters that describe a pattern in text. Regular expressions use a syntax that has evolved over a number of years, and that is now codified as part of the POSIX standard.
Regular expressions are extremely useful, because they allow you to work with text in terms of patterns. For example, you can use regular expressions to search the park table and identify any park with a description containing text that looks like a phone number. You can then use the same regular expression to extract that phone number from the description.
This tutorial will get you started using regular expressions, but we can only begin to cover the topic in this small book. If you want to learn about regular expressions in depth, see Jeffrey Friedl's excellent book Mastering Regular Expressions (O'Reilly).
The simplest type of pattern is simply an exact string of characters that you are searching for, such as the string in the following WHERE clause:
SELECT *
FROM park
WHERE park_name='Mackinac Island State Park';
However, the string 'Mackinac Island State Park' isn't what most people think of when you mention the word "pattern." The expectation is that a pattern will use so-called metacharacters that allow for matches when you know only the general pattern of text you are looking for.
Standard SQL has long had rather limited support for pattern matching in the form of the LIKE predicate. For example, the following query attempts to return the names of all state parks:
SELECT park_name
FROM park
WHERE park_name LIKE '%State Park%';
The percent (%) characters in this pattern specify that any number of characters are allowed on either side of the string 'State Park'. Any number of characters may be zero characters, so strings in the form 'xxx State Park' fit the pattern. There! I've just used a pattern to describe the operation of a pattern.
Humans have long used patterns as a way to organize and describe text. Look no further than your address and phone number for examples of commonly used patterns.
Handy as it is at times, LIKE is an amazingly weak predicate, supporting only two expression metacharacters that don't even begin to address the range of patterns you might need to describe in your day-to-day work. You need more. You need a richer and more expressive language for describing patterns. You need regular expressions.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Oracle's Regular Expression Support
Oracle's regular expression support manifests itself in the form of three SQL functions and one predicate that you can use to search and manipulate text in any of Oracle's supported text datatypes: VARCHAR2, CHAR, NVARCHAR2, NCHAR, CLOB, and NCLOB.
Regular expression support does not extend to LONG, because LONG is supported only for backward compatibility with existing code.
Following are the four functions you'll use to work with regular expressions in Oracle:
REGEXP_LIKE
Determines whether a specific column, variable, or text literal contains text matching a regular expression.
REGEXP_INSTR
Locates, by character position, an occurrence of text matching a regular expression.
REGEXP_REPLACE
Replaces text matching a regular expression with new text that you specify. Your replacement text can include backreferences to values in the regular expression.
REGEXP_SUBSTR
Extracts text matching a regular expression from a character column, variable, or text literal.
Of these, you've already seen REGEXP_LIKE in quite a few examples. REGEXP_LIKE is documented in the "Conditions" chapter of the Oracle Database 10g SQL Reference because in SQL it can only be used as a predicate in the WHERE and HAVING clauses of a query or DML statement. In PL/SQL, however, you can use REGEXP_LIKE as you would any other Boolean function:
DECLARE
   x Boolean;
BEGIN
   x := REGEXP_LIKE(
        'Does this string mention Oracle?',
        'Oracle');
END;
/
The remaining three functions work identically in SQL and PL/SQL. All four functions are fully described in Section 1.9 near the end of this book.
Oracle is notable for its Globalization Support in that it supports an exceedingly wide variety of character sets, languages, territories, and linguistic sorts. Regular expressions are no exception. The combination of character set, language, and territory is known as a locale. Oracle's regular expression engine respects locale, and is configurable via NLS (National Language Support) parameter settings.
Following are some notable examples of the way in which regular expression locale support affects you:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Regular Expression Quick Reference
This section provides a quick-reference summary of the behavior of all the regular expression metacharacters supported by Oracle.
Most metacharacters are treated as regular characters when used within square brackets ([]). See [ ] (Square Brackets) for more details on this issue.
\ (Backslash)
Escapes a metacharacter
Use the backslash (\) to treat as normal a character that would otherwise have a special meaning. For example, to extract a dollar amount from a sentence, you might escape the period (.) and the dollar sign ($) as follows:
SELECT REGEXP_SUBSTR(
   'This book costs $9.95 in the U.S.',
   '\$[[:digit:]]+\.[[:digit:]]+')
FROM dual;

$9.95
The \$ in this expression requires that the matching text begin with a dollar sign. The \. requires a period between the two digit groups.
To specify a backslash in an expression, escape it with another backslash. The following query retrieves all text up to and including the last backslash:
SELECT REGEXP_SUBSTR(
   'I want this \ but not this',
   '.*\\')
FROM dual;

I want this \
If the character following a backslash in an expression is not a metacharacter, then the backslash is ignored:
SELECT REGEXP_SUBSTR('\test','\test')
FROM dual;

test
The \ has no special meaning within square brackets ([]). When used within square brackets, \ represents itself.
\1 through \9 (Backslash)
References the value matchedby a preceding subexpression
Use \1, \2, \3, etc. through \9 to create backreferences to values matched by preceding subexpressions. You can backreference up to nine subexpressions, the first nine, in any expression. Subexpressions are numbered in the order in which their opening parentheses are encountered when scanning from left to right.
For example, to flip a name from last, first format to first last:
SELECT REGEXP_REPLACE(
   'Sears, Andrew',
   '(.+), (.+)','\2 \1')
FROM dual;

Andrew Sears
For more examples, see Section 1.6.8 in Section 1.6 and REGEXP_REPLACE under Section 1.9.
. (Period)
Matches any character
The period matches any character in the underlying character set of the string to be searched, except that by default it does not match the newline character.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Oracle Regular Expression Functions
Oracle's regular expression support, which we introduced earlier in the book, manifests itself in the form of four functions, which are described in this section. Each function is usable from both SQL and PL/SQL.
All the examples in this section search text literals. We do this to make it obvious how each function works, by showing you both input and output for each example. Typically, you do not use regular expressions to search string literals, but rather to search character columns in the database, or character variables in PL/SQL.
For the same reason, the regular expressions in this section are simple to the extreme. We don't want you puzzling over our expressions when what you really want is to understand the functions.
REGEXP_INSTR
Locates text matching a pattern
REGEXP_INSTR returns the beginning or ending character position of a regular expression within a string. You specify which position you want. The function returns zero if no match is found.
Syntax
REGEXP_INSTR(source_string, pattern 
            [, position [, occurrence
            [, return_option 
            [, match_parameter]]]])
All parameters after the first two are optional. However, to specify any one optional parameter, you must specify all preceding parameters. Thus, if you want to specify match_parameter, you must specify all parameters.
Parameters
source_string
The string you want to search.
pattern
A regular expression describing the text pattern you are searching for. This expression may not exceed 512 bytes in length.
position
The character position at which to begin the search. This defaults to 1, and must be positive.
occurrence
The occurrence of pattern you are interested in finding. This defaults to 1. Specify 2 if you want to find the second occurrence of the pattern, 3 for the third occurrence, and so forth.
return_option
Specify 0 (the default) to return the pattern's beginning character position. Specify 1 to return the ending character position.
match_parameter
A set of options in the form of a character string that change the default manner in which regular expression pattern matching is performed. You may specify any, all, or none of the following options, in any order:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Oracle Regular Expression Error Messages
The following list details Oracle errors specific to regular expressions, and suggests how you might resolve them.
ORA-01760: illegal argument for function
This is not strictly a regular expression error. However, you can get this error if you pass an invalid match_parameter to one of the REGEXP functions. See Section 1.7.3 in Section 1.7 for more details.
You can also get this error by passing an invalid type for any parameter. For example, you'll get this error if you pass a number where a string is expected, or vice-versa.
If you do get this error as the result of a call to one of the REGEXP functions, check to be sure that all your argument types are valid, and that you are passing only valid matching options ('i', 'c', 'm', or 'n') in your match_parameter argument, which is always the last argument of a REGEXP function call.
ORA-12722: regular expression internal error
Contact Oracle Support and open a Technical Assistance Request (TAR), because you've encountered a bug.
ORA-12725: unmatched parentheses in regular expression
You have mismatched parentheses in your expression. For example, an expression like '(a' will cause this error. Carefully check each subexpression to be sure you include both opening and closing parentheses. Check to see whether you've correctly escaped parentheses that do not enclose subexpressions, and make sure you haven't inadvertently escaped a parentheses that should open or close a subexpression.
ORA-12726: unmatched bracket in regular expression
You have mismatched square brackets in your expression. Apply the advice we give for ORA-12725, but this time look at your use of square brackets. Also, while an expression such as '[a' will cause this error, an expression such as 'a]' will not, because a closing (right) bracket is treated as a regular character unless it is preceded by an opening (left) bracket.
ORA-12727: invalid back reference in regular expression
You wrote a backreference to a subexpression that does not exist, or that does not yet exist. For example, '
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Return to Oracle Regular Expressions Pocket Reference