book

flex & bison

by John Levine

August 2009

Intermediate to advanced

292 pages

7h 31m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Scope of This BookConventions Used in This BookGetting Flex and BisonThis Book’s Example FilesUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
Lexical Analysis and ParsingRegular Expressions and ScanningOur First Flex ProgramPrograms in Plain FlexPutting Flex and Bison TogetherThe Scanner as CoroutineTokens and ValuesGrammars and ParsingBNF GrammarsBison’s Rule Input LanguageCompiling Flex and Bison Programs TogetherAmbiguous Grammars: Not QuiteAdding a Few More RulesFlex and Bison vs. Handwritten Scanners and ParsersExercises
Regular ExpressionsRegular Expression ExamplesHow Flex Handles Ambiguous PatternsContext-Dependent TokensFile I/O in Flex ScannersReading Several FilesThe I/O Structure of a Flex ScannerInput to a Flex ScannerFlex Scanner OutputStart States and Nested Input FilesSymbol Tables and a Concordance GeneratorManaging Symbol TablesUsing a Symbol TableC Language Cross-ReferenceExercises
How a Bison Parser Matches Its InputShift/Reduce ParsingWhat Bison’s LALR(1) Parser Cannot ParseA Bison ParserAbstract Syntax TreesAn Improved Calculator That Creates ASTsLiteral Character TokensBuilding the AST CalculatorShift/Reduce Conflicts and Operator PrecedenceWhen Not to Use Precedence RulesAn Advanced CalculatorAdvanced Calculator ParserCalculator Statement SyntaxCalculator Expression SyntaxTop-Level Calculator GrammarBasic Parser Error RecoveryThe Advanced Calculator LexerReserved WordsBuilding and Interpreting ASTsEvaluating Functions in the CalculatorUser-Defined FunctionsUsing the Advanced CalculatorExercises
A Quick Overview of SQLRelational DatabasesManipulating RelationsThree Ways to Use SQLSQL to RPNThe LexerScanning SQL KeywordsScanning NumbersScanning Operators and PunctuationScanning Functions and NamesComments and MiscellanyThe ParserThe Top-Level Parsing RulesSQL ExpressionsFunctionsOther expressionsSelect StatementsSelect options and table referencesSELECT table referencesDelete StatementInsert and Replace StatementsReplace statementUpdate StatementCreate DatabaseCreate TableUser VariablesThe Parser RoutinesThe Makefile for the SQL ParserExercises
Structure of a Flex SpecificationDefinition SectionRules SectionUser SubroutinesBEGINC++ ScannersContext SensitivityLeft ContextRight ContextDefinitions (Substitutions)ECHOInput ManagementStdio File ChainingInput BuffersInput from StringsFile Nestinginput()YY_INPUTFlex LibraryInteractive and Batch ScannersLine Numbers and yylinenoLiteral BlockMultiple Lexers in One ProgramCombined LexersMultiple LexersOptions When Building a ScannerPortability of Flex LexersPorting Generated C LexersBuffer sizesCharacter setsReentrant ScannersExtra Data for Reentrant ScannersAccess to Reentrant Scanner DataReentrant Scanners, Nested Files, and Multiple ScannersUsing Reentrant Scanners with BisonRegular Expression SyntaxMetacharactersREJECTReturning Values from yylex()Start Statesunput()yyinput() yyunput()yylengyyless()yylex() and YY_DECLyymore()yyrestart()yy_scan_string and yy_scan_bufferYY_USER_ACTIONyywrap()
Structure of a Bison GrammarSymbolsDefinition SectionRules SectionUser Subroutines SectionActionsEmbedded ActionsSymbol Types for Embedded ActionsAmbiguity and ConflictsTypes of ConflictsShift/Reduce ConflictsReduce/Reduce Conflicts%expectGLR ParsersBugs in Bison ProgramsInfinite RecursionInterchanging PrecedenceEmbedded ActionsC++ Parsers%code BlocksEnd MarkerError Token and Error Recovery%destructorInherited Attributes ($0)Symbol Types for Inherited Attributes%initial-actionLexical FeedbackLiteral BlockLiteral TokensLocations%parse-paramPortability of Bison ParsersPorting Bison GrammarsPorting Generated C ParsersLibrariesCharacter CodesPrecedence and Associativity DeclarationsPrecedenceAssociativityPrecedence DeclarationsUsing Precedence and Associativity to Resolve ConflictsTypical Uses of PrecedenceRecursive RulesLeft and Right RecursionRulesSpecial Characters%start DeclarationSymbol ValuesDeclaring Symbol TypesExplicit Symbol TypesTokensToken NumbersToken Values%type Declaration%union DeclarationVariant and Multiple GrammarsCombined ParsersMultiple ParsersUsing %name-prefix or the -p FlagLexers for Multiple ParsersPure Parsersy.output FilesBison Librarymain()yyerror()YYABORTYYACCEPTYYBACKUPyyclearinyydebug and YYDEBUGYYDEBUGyydebugyyerrokYYERRORyyerror()yyparse()YYRECOVERING()
The Pointer Model and ConflictsKinds of ConflictsParser StatesContents of name.outputReduce/Reduce ConflictsShift/Reduce ConflictsReview of Conflicts in name.outputCommon Examples of ConflictsExpression GrammarsIF/THEN/ELSENested List GrammarHow Do You Fix the Conflict?IF/THEN/ELSE (Shift/Reduce)Loop Within a Loop (Shift/Reduce)Expression Precedence (Shift/Reduce)Limited Lookahead (Shift/Reduce or Reduce/Reduce)Overlap of Alternatives (Reduce/Reduce)SummaryExercises
Error ReportingLocationsAdding Locations to the ParserAdding Locations to the LexerMore Sophisticated Locations with FilenamesError RecoveryBison Error RecoveryFreeing Discarded SymbolsError Recovery in Interactive ParsersWhere to Put Error TokensCompiler Error RecoveryExercises

Pure Scanners and ParsersPure Scanners in FlexPure Parsers in BisonUsing Pure Scanners and Parsers TogetherA Reentrant CalculatorGLR ParsingGLR Version of the SQL ParserC++ ParsersA C++ CalculatorC++ Parser NamingA C++ ParserInterfacing a Scanner with a C++ ParserShould You Write Your Parser in C++ ?Exercises

Content preview from flex & bison

Chapter 1. Introducing Flex and Bison

Flex and Bison are tools for building programs that handle structured input. They were originally tools for building compilers, but they have proven to be useful in many other areas. In this first chapter, we’ll start by looking at a little (but not too much) of the theory behind them, and then we’ll dive into some examples of their use.

Lexical Analysis and Parsing

The earliest compilers back in the 1950s used utterly ad hoc techniques to analyze the syntax of the source code of programs they were compiling. During the 1960s, the field got a lot of academic attention, and by the early 1970s, syntax analysis was a well-understood field.

One of the key insights was to break the job into two parts: lexical analysis (also called lexing or scanning) and syntax analysis (or parsing).

Roughly speaking, scanning divides the input into meaningful chunks, called tokens, and parsing figures out how the tokens relate to each other. For example, consider this snippet of C code:

alpha = beta + gamma ;

A scanner divides this into the tokens alpha, equal sign, beta, plus sign, gamma, and semicolon. Then the parser determines that beta + gamma is an expression, and that the expression is assigned to alpha.