"http://www.w3.org/TR/html4/loose.dtd"> >

Chapter 12
Lexer

12.1 Token matching

Tokens are defined by their regular expressions (see 6.2). TPG builds a regular expression by assembling each regular expression in a or structure. For example to recognize int ([0 - 9]+) and word ([a - zA - Z]+), TPG builds this composite expression: (?P < int > [0 - 9]+) | (?P < word > [a - zA - Z]+) This expression is then compiled using the re module.

For each token we save its name, its text, its value (i.e. the result of its action applied to its text), the line number and the position of the start and the end of the token in the input string.

There is a special token named EOF used as the erroneous token when a lexical error appears near the end of the input.