Intro. to Parsing
New 2.01 Features
Expression evaluator (freeware)
XIDEK interpreter kit (freeware)
AnaGram Parser Generator:
AnaGram interprets a single character to mean the set that contains only the character itself.
A character string enclosed in double quotes, such as "while" or "/*", is a keyword. See example. The rules for writing keyword strings are the same as for literal strings in C. AnaGram parsers have special lookahead logic to recognize keywords, so that keywords get special treatment. They are not equivalent to the corresponding sequence of single characters.
The units of a grammar are called tokens. Terminal tokens may be character sets, keywords, immediate actions, or virtual productions. Nonterminal tokens are defined in terms of other tokens by means of productions.
dinner -> appetizer, salad, main course, dessertThe order of the elements in a rule is significant, but productions themselves may appear in any order in a syntax file. A production with more than one name on the left is called a semantically determined production.
Additional productions with the same left side may be joined by using | or another arrow. The arrow, if used, must start a new line:
variable name -> letter | variable name, digitis equivalent to
variable name -> letter -> variable name, digitIf the token on the left side of a production is called grammar or is tagged with a following dollar sign, it is taken to be the grammar token, or goal token for the grammar.
The names on the left side of a production may be preceded by a type cast indicating the data type of the semantic value of the named tokens.
A virtual production is a token name or character set expression followed by ? or ?..., or a sequence of one or more rules, joined by |, inside brackets or braces and optionally followed by an ellipsis (...). The ? indicates an optional token. Braces indicate a choice among the listed rules. Brackets indicate an optional choice. The ellipsis represents unlimited repetition.
A reduction procedure is a piece of C or C++ code following a grammar rule that is to be executed when the rule is recognized in the parser's input stream. Reduction procedures may be short form: a single expression followed by a semicolon, or long form: a block of code enclosed in braces. In either case they are preceded by an equal sign. Short form procedures may not continue onto another line.
Reduction procedures may access the semantic values of tokens in the grammar rule to which they are attached. To each token whose value is needed append a colon and the variable name used for the token value in the reduction procedure. In a short form reduction procedure, the value of the expression is assigned to the reduction token, the token on the left side of the production. In a long form procedure, use the return statement to assign a value to the token on the left side of the production.
An immediate action differs from a reduction procedure in that it may occur in the middle of a grammar rule. To distinguish it from a reduction procedure, it begins with an exclamation point rather than an equal sign.
You may assign names to frequently used character sets, virtual productions, keywords, or immediate actions by using a definition statement consisting of a name, an equal sign and the entity to be named. For example:
digit = '0-9'
A configuration section is a block of special statements enclosed in square brackets. These are either attribute statements or assign values to configuration parameters or switches, all of which are described in on-line help windows.
You may include C or C++ code to support your reduction procedures at any point in your grammar by enclosing it in braces. The beginning brace must be on a fresh line, and no other statement may follow on the same line as the terminating brace. A block of embedded C at the very beginning of a syntax file is called the C prologue.
Links to: Home page | Trial Copy | Syntax Directed Parsing | Glossary