Flex and bison are tools designed for writers of compilers and interpreters, although they are also useful for many applications that will interest noncompiler writers. Any application that looks for patterns in its input or has an input or command language is a good candidate for flex and bison. Furthermore, they allow for rapid application prototyping, easy modification, and simple maintenance of programs. To stimulate your imagination, here are a few things people have used flex and bison, or their predecessors lex and yacc, to develop:
The desktop calculator bc
The tools eqn and pic, typesetting preprocessors for mathematical equations and complex pictures
Many other “domain-specific languages” targeted for a particular application
PCC, the Portable C Compiler used with many Unix systems
A SQL database language translator
Chapter 1, Introducing Flex and Bison, gives an overview of how and why flex and bison are used to create compilers and interpreters and demonstrates some simple applications including a calculator built in flex and bison. It also introduces basic terms we use throughout the book.
Chapter 2, Using Flex, describes how to use flex. It develops flex applications that count words in files, handle multiple and nested input files, and compute statistics on C programs.
Chapter 3, Using Bison, gives a full example using flex and bison to develop a fully functional desktop calculator with variables, procedures, loops, and conditional expressions. It shows the use of abstract syntax trees (ASTs), powerful and easy-to-use data structures for representing parsed input.
Chapter 4, Parsing SQL, develops a parser for the MySQL dialect of the SQL relational database language. The parser checks the syntax of SQL statements and translates them into an internal form suitable for an interpreter. It shows the use of Reverse Polish Notation (RPN), another powerful form used to represent and interpret parsed input.
Chapter 5, A Reference for Flex Specifications, and Chapter 6, A Reference for Bison Specifications, provide detailed descriptions of the features and options available to flex and bison programmers. These chapters and the two that follow provide technical information for the now-experienced flex and bison programmer to use while developing flex and bison applications.
Chapter 7, Ambiguities and Conflicts, explains bison ambiguities and conflicts, which are grammar problems that keep bison from creating a parser from a grammar. It then develops methods that can be used to locate and correct such problems.
Chapter 8, Error Reporting and Recovery, discusses techniques that compiler or interpreter designers can use to locate, recognize, and report errors in the compiler input.
Chapter 9, Advanced Flex and Bison, covers reentrant scanners and parsers, Generalized Left to Right (GLR) parsers that can handle grammars that regular bison parsers can’t, and interfaces to C++.
The appendix provides the complete grammar and a cross-reference for the SQL parser discussed in Chapter 4.
The glossary lists technical terms from language and compiler theory.
We presume you are familiar with C, because most examples are in C, flex, or bison, with a few in C++ and the remainder in SQL or the special-purpose languages developed within the text.
The following conventions are used in this book:
Used for new terms and concepts when they are introduced.
Used for program listings, as well as within paragraphs to refer to program elements such as statements, classes, macros, states, rules, all code terms, and files and directories.
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
is the shell prompt.
surround optional elements in a description of program syntax. (Don’t type the brackets themselves.)
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.
Flex and bison are modern replacements for the classic lex and yacc that were both developed at Bell Laboratories in the 1970s. Yacc was the first of the two, developed by Stephen C. Johnson. Lex was designed by Mike Lesk and Eric Schmidt (the same Eric Schmidt who now heads Google) to work with yacc. Both lex and yacc have been standard Unix utilities since Seventh Edition Unix in the 1970s.
The GNU Project of the Free Software Foundation distributes bison, a foreward-compatible replacement for yacc. It was originally written by Robert Corbett and Richard Stallman. The bison manual is excellent, especially for referencing specific features. Bison is included with all common distributions of BSD and Linux, but if you want the most up-to-date version, its home page is:
BSD and the GNU Project also distribute flex (Fast Lexical Analyzer Generator), “a rewrite of lex intended to fix some of that tool’s many bugs and deficiencies.” Flex was originally written by Jef Poskanzer; Vern Paxson and Van Jacobson have considerably improved it. Common distributions of BSD and Linux include a copy of flex, but if you want the latest version, it’s now hosted at SourceForge:
The programs in this book are available online as:
They can be downloaded by any web browser or FTP client. The zip format file can be decoded by the popular freeware unzip utility on Unix-ish and Linux systems or opened as a compressed folder on Windows XP or newer.
The examples in the book were all tested with flex version 2.5.35 and bison 2.4.1.
This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Book Title by Some Author. Copyright 2008 O’Reilly Media, Inc., 978-0-596-xxxx-x.”
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at firstname.lastname@example.org.
When you see a Safari® Books Online icon on the cover of your favorite technology book, that means the book is available online through the O’Reilly Network Safari Bookshelf.
Safari offers a solution that’s better than e-books. It’s a virtual library that lets you easily search thousands of top tech books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at http://my.safaribooksonline.com.
Despite all the help, errors remain the author’s responsibility.
When you find some, or if you have other comments, email them to
email@example.com, being sure to include the name of the
book in the subject line to alert the spam filters that you are a real
person rather than a deceased kleptocrat from a developing country. Or
drop by the Usenet group
comp.compilers where questions about compiler
tools are always on topic.
You can also address comments and questions concerning this book to the publisher:
|O’Reilly Media, Inc.|
|1005 Gravenstein Highway North|
|Sebastopol, CA 95472|
|800-998-9938 (in the United States or Canada)|
|707-829-0515 (international or local)|
|707 829-0104 (fax)|
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:
To comment or ask technical questions about this book, send email to:
For more information about our books, conferences, Resource Centers, and the O’Reilly Network, see our web site at:
Every book is the work of many people, and this one is no exception. I thank Tony Mason and Doug Brown, my coauthors of lex & yacc, for permission to adapt parts of that book. Many people provided useful comments on the draft manuscript, including Lorenzo Bettini, Joel E. Denny, Danny Dubé, Ben Hanson, Jan Van Katwijk, Jeff Kenton, Timothy Knox, Chris Morley, Ken Rose, and Juha Vihavainen. I particularly thank Derek M. Jones, who provided a detailed page-by-page review in an unreasonably short time. Simon St. Laurent, my long-suffering editor, as always shepherded the book skillfully and without complaint through the editorial and production process.