3.1 Scanner

The job of a Scanner or lexical analyzer is identifying low-level language constructs – language atoms like identifiers, keywords, numbers, operators, string literals, etc. As these constructs can be represented as regular languages, a Scanner design is based on regular expressions and uses a finite-state machine as its implementation model.

The basic scanning process is a character-by-character examination of the input source code and identifying the tokens. A real Scanner will have to do several further jobs, first of which is to supply an internal representation of the atoms, called tokens, to the next phase – the parser. Consider a grammar for an arithmetic expression.

 

 

Fig. 3.1 Phases of a compiler: Scanner

 

E −> E + T ...

Get Compilers: Principles and Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.