XIDEK
|
dxi: Direct Execution Interpreters
IntroductionThe baseline "Direct Execution" interpreters, found in cll\base\dxi.syn and pll\base\dxi.syn execute the statements in a script as they parse the script. For most aspects of the interpreter, this is an extremely simple and straightforward way to approach the problem. It does not require extensive support modules as is the case with other approaches.This immediacy, however, is purchased at the price of substantial complications in parsing. Because the parser must somehow repeat the body of loops as indicated by the loop control parameters or skip a block of code as directed by an if statement, some substantial changes to the syntax are required. In addition, the parser must also support a variety of execution modes. Apart from these problems, however, the dxi interpreters are straightforward and easy to modify. For many problems, i.e., those which do not add new loop or conditional constructs, direct execution is probably the easiest approach. In what follows, we describe first the features of the CLL version of the dxi.syn syntax file and how it differs from the cll.syn file. Then we describe the differences required for the PLL version.
C PrologueIntroductionThe C Prologue is the initial piece of embedded C/C++ code in a syntax file. AnaGram copies it to the parser file before writing any generated code, so it is the appropriate place to include definitions and declarations needed by the parser. In particular, any data type that is used as a token type must be defined in the C prologue, either explicitly or by means of an include file.
Include FileThe iostream.h system header file is included to support the file output performed by the dump and print statements.
Mode EnumerationWhen executing a script at parse time, it is often necessary to skip over a portion of the script without executing it. When skipping over code, it is also necessary to be able to determine when it is time to stop skipping code. The Mode enumeration identifies the various states the parser may occupy with respect to skipping code. The values of the enumeration are as follows:
ParseContext StructThe ParseContext struct is used in place of the FileLocation class to capture and record the mode as well as the location of the parser within the script. The class inherits from FileLocation and therefore captures the input pointer, the line number and the column number. The struct adds the following field: In the configuration section of the syntax file, the context type is specified as ParseContext. The GET_CONTEXT macro calls the location() member function of the parser control block which in turn constructs a ParseContext object to store on the context stack.The value of ParseContext corresponding to the beginning of a rule is given by the CONTEXT and PCONTEXT(PCB) macros.
VALUE MacroThe VALUE macro controls the evaluation of expressions in accordance with the current execution mode. It also checks the result for errors.The argument to the macro is an expression. If the current mode is executeMode the expression is evaluated. If the mode is anything else, an uninitialized Value object is returned. Note that the expression is evaluated using the arithmetic operators defined in the Value class. If they encounter an error, such as division by zero, they will throw an exception.
Configuration SectionConfiguration parameters that are common to all the parsers in the kit are discussed here. The following additional configuration parameters are set in the CLL version of dxi.syn:
Parser Control Block ExtensionsThe parser control block is a struct used in an AnaGram parser to contain the entire state of a parse.In order to support reentrancy, it is convenient to declare local data used by the parser in the parser control block. It is also convenient to declare functions used by the parser as members of the parser control block, though this is not, strictly speaking, necessary.
Added Fields
Local FunctionsThe following functions are declared to be member functions of the parser control block. The actual implementations of these functions are found in the embedded C portion of the syntax file.
Modifications to the SyntaxIntroductionThere are two sets of modifications to the language syntax in the dxi interpreters. The first is straightforward: to simplify calculation, arithmetic operators, which in the language syntax are factored out of the expression syntax, have been substituted back into the expression syntax. The only significant effect of this substitution is to increase the number of states in the parser.The alternative would be to pass an appropriate function pointer as the value of the various expression tokens. Such an approach would detract seriously from the basic simplicity of the direct execution approach to interpreting. The second set of modifications results from the problems of implementing loops. In order to compute while parsing it is necessary to repeat the code inside loops. There are several ways to do this of varying complexity. It is possible, for example, to "unroll" loops, by backing up the input pointer and reparsing the loop. The syntax that correctly describes the actual input to the parser then is quite different from, and substantially more complex than, the syntax of the actual script language. Writing a nonambiguous syntax to describe this process is nontrivial. Fortunately, there is a simpler technique for handling loops. In this approach, the syntax for the main parser identifies only the loop header. A reduction procedure then makes recursive calls to the parser to parse the loop. When the reduction procedure returns, the loop has been correctly executed and the input pointer has been advanced over the body of the loop, so that from the point of view of the main parser, the body of the loop doesn't even exist. To the main parser, a while loop consists simply of the keyword "while". A do while loop consists solely of the keyword "do". A for loop is more complicated. It retains the entire control portion of the loop. The reason for this is that the condition and increment fields of the loop do not appear physically in the script in the order that they are executed. Thus, the parser must do a preliminary scan to determine the locations of these fields. Two functions, simpleLoop() and forLoop() have been added to the parser control block to control the execution of loops. Both of these functions invoke parseSyntax() to actually make the recursive calls to the parser. This approach makes the execution of loops quite easy, but at the expense of setting the parser up to handle the recursive calls correctly. In particular, this requires setting up the grammar so that the actual syntax to be parsed can be specified when the parser is called.
Syntax SelectionIn order to support the recursive calls used to implement loops, the grammar for the direct execution parsers has been modified to allow the program that calls the parser to specify precisely what it should parse.The technique used here is to use a number of "selector" keywords to select the specific syntax:
The way it works is as follows. Each selector keyword has a wrapper token: SCRIPT -> "script" =PCB.setStart(); WHILE LOOP -> "whileLoop" =PCB.setStart(); DO LOOP -> "do" =PCB.setStart(); STATEMENT -> "statement" =PCB.setStart(); FOR FIELD -> "forField" =PCB.setStart();The wrapper tokens, in turn, are used to select the actual syntax to be parsed. This is done in the specification of the "start" token for the grammar: (Value) grammar -> SCRIPT, statements, eof =Value(); -> WHILE LOOP, loop condition, statement =Value(); -> DO LOOP, statement, DO WHILE, loop condition =Value(); -> STATEMENT, statement =Value(); -> FOR FIELD, optional expression:x, ';' + ')' =x;When the selector keyword is recognized, the setStart() function is called to change the input pointer in the parser control block to point to the position in the script which was previously provided as an argument to the parseSyntax() function. The line and column numbers are set properly as well, in order to make sure the location of error diagnostics can be correctly identified. Note that the parser must be prepared to return a Value object to support the for field case. There is one other rather subtle detail that must be handled correctly to make this scheme work. The disregard white space feature is implemented by rewriting the grammar to tie optional white space to each lexeme. That means that, were no special measures taken, when the rewrite takes place white space would be tied to the selector keywords and would be parsed over before the reduction procedures are called. In and of itself, there's no problem with this, except that the white space that needs to be skipped is that which is seen after the reduction procedure tied to the selector keyword has been called and the input pointer changed. Therefore, the wrapper tokens have been declared as lexemes. This causes the white space to be scanned only after the reduction procedures have been invoked to modify the input pointer. One unfortunate aspect of this selector method is that the File Trace feature of AnaGram cannot be used to parse a sample script file. This happens because the actual stream of characters seen by the parser differs substantially from the content of the script file.
While and Do While LoopsBecause while and do while loops are executed recursively by calls to parseSyntax(), which advances the input pointer over the body of the loop, while and do while statements are represented in the main syntax simply by the keywords "while" and "do" respectively.The function simpleLoop() controls the actual execution of the loops. simpleLoop() in turn, calls parseSyntax() recursively, using the specified selector string as an argument. This change in the representation of the while statement means that the while statement is no longer recursive, as seen from the primary syntax. Thus the while statement is no longer involved in the "dangling else" problem. For example, consider the statement: if (x) while (y) if (x < y) {...} else {...}Because the second if statement is completely invisible to the primary syntax, there is no possible question about which if the else is associated with. Thus instead of having both open while and closed while statements in the grammar, there is only one while statement, and it is classified as a simple statement. There is an important point to notice in the parsing of the WHILE LOOP and DO LOOP syntaxes. In the reduction procedure for the loop condition rule, which is used for both while loops and do while loops, the outcome of the condition is tested only if the parser mode is executeMode. Otherwise, the mode is always set to breakMode, guaranteeing an exit from the loop. In order to handle the loop condition properly, the "while" keyword used in the DO LOOP syntax is wrapped by the DO WHILE token. The reduction procedure attached to the wrapper calls changeMode()to change the parser mode back to executeMode if has been set to continueMode in the course of executing the loop.
For LoopsThe treatment of for loops is substantially more complicated than that of while and do while loops. A for statement consists of four separate fields: the initialization field, the condition field, the increment field, and the body of the loop itself. These fields must be executed in the correct order, an order which differs from the order in which the fields appear physically in the script.The first field in a for statement is the initialization field. It must be executed whatever the outcome of the loop condition. Of course, it is not executed if the parser mode is not executeMode. Because the condition field and the increment field must be executed independently, under the control of the forLoop() function, the parser mode is set to scanMode, so that the locations of these fields can be captured without executing them. Since the expressions in these fields can have side effects, executing them when they should not be executed could lead to errors in the values calculated by the script. The locations of the condition and increment fields are passed as arguments to forLoop(), which, as its first order of business, calls restoreMode() to reset the parser mode to the value it had when the for loop was encountered. Thus, if the mode was anything but executeMode, the entire body of the loop will be skipped over without being executed. As with the while and do while loops, the body of a for loop is not visible to the script syntax, and thus, like them, is not involved in the "dangling else" problem and can therefore be treated as a simple statement.
Embedded CMacro Definitions
ParseContext Member Functions
Parser Control Block Member Functions
The External Interface: interpret()The external interface for the interpreter declares an instance of a parser control block. It then calls the parseSyntax() function specifying the syntax selector "script" and a FileLocation object specifying the starting location of the script.
PLL DifferencesIntroductionThe differences between the pll version of dxi.syn and pll.syn are very nearly identical to the corresponding CLL differences discussed above. The exceptions are as follows:
ForControl Struct
IntroductionThe ForControl structure, defined in the C Prolog section of the syntax file, is used only by the PLL version of the dxi interpreter. It is used to encapsulate for loop parameters. Constructors have been defined as a convenience, though they are not really necessary. Note that since the struct contains an instance of the Value class, and is used as a token value in the parse, there must be a wrapper declaration for it in the configuration section of the file.
Member Fields
Constructors
Parser Control Block Member Functions
|
Table of Contents | | | Parsifal Software Home Page |
---|
XIDEK
Extensible Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.