Chapter 2. Using Flex

In this chapter we’ll take a closer look at flex as a standalone tool, with some examples that exercise most of its C language capabilities. All of flex’s facilities are described in Chapter 5, and the usage of flex scanners in C++ programs is described in Chapter 9.

Regular Expressions

The patterns at the heart of every flex scanner use a rich regular expression language. A regular expression is a pattern description using a metalanguage, a language that you use to describe what you want the pattern to match. Flex’s regular expression language is essentially POSIX-extended regular expressions (which is not surprising considering their shared Unix heritage). The metalanguage uses standard text characters, some of which represent themselves and others of which represent patterns. All characters other than the ones listed below, including all letters and digits, match themselves.

The characters with special meaning in regular expressions are:

.

Matches any single character except the newline character (\n).

[]

A character class that matches any character within the brackets. If the first character is a circumflex (^), it changes the meaning to match any character except the ones within the brackets. A dash inside the square brackets indicates a character range; for example, [0-9] means the same thing as [0123456789] and [a-z] means any lowercase letter. A - or ] as the first character after the [ is interpreted literally to let you include dashes and square brackets ...

Get flex & bison now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.