Parsifal Software


XIDEK
Extensible Interpreter Development Kit
Reference documentation


astxi: Tree Execution Interpreters

Introduction

The "Abstract Syntax Tree Execution" interpreter, support\base\astxi.cpp creates an abstract syntax tree by using an Abstract Syntax Tree Parser, then executes the statements in a script by traversing the tree. Two versions of the astxi interpreter can be built by using either CLL syntax or PLL syntax. This is a somewhat more complicated method, in terms of support code, than used by the Direct Execution Interpreter, but it has the distinct advantage of substantially greater simplicity in the execution of loops.

If a single script must be executed several times, by using this approach it is possible to arrange to parse the script only once, thus avoiding parsing overhead on subsequent executions. Building an abstract syntax tree is noticeably simpler than creating a bytecode interpreter, as is done in the Direct Compilation Interpreter, and provides some of the same performance benefits.


External Interface

The astxi interpreter is encapsulated in the Interpreter class. The interpreter interface function, interpret(), is implemented by creating an Interpreter object and then invoking the object's public member function, execute().


The Interpreter Class

Introduction

The Interpreter class encapsulates the functions and the data necessary to traverse an abstract syntax tree and execute the statements it describes.

Strictly speaking, this class is unnecessary. The functions in the class could have been written as external functions. They would, however, have had to have another argument in order to pass the reference to the Dataset object. Should any additional parameters become necessary, it would be necessary to change all the function calls to accommodate them. By making the functions members of a class and defining these parameters as class members, the calling sequences are simplified and are unlikely to require changes to accommodate program modifications.


Private Member Fields

AbstractSyntaxTree tree;
The tree is created by a call to buildTree(), the external interface of the Abstract Syntax Tree parser, at construction time.

Dataset &dataset;
This field provides a reference to the Dataset object the interpreter is to operate on. It is set by the constructor.

Value returnValue;
This field is used as a temporary location to store the return value from the script. If a return statement with an argument expression is encountered, the expression is evaluated and the result is stored here.


Enumeration

Outcome
The Outcome enumeration is used to encode the result of executing a statement. The result indicates whether the statement executed to completion or whether it encountered a break or continue statement within a nested statement that was not part of a loop.

The enumeration identifies four cases:

normal
The statement executed was neither a break, continue, nor return statement, nor was a return statement encountered in any nested statement, nor was a break or continue statement encountered in a nested statement that was not itself inside a nested loop. Note that a break or continue statement affects only the innermost loop that contains it. A return statement, on the other hand, causes all loops that contain it to terminate.

If the outcome of executing a statement is normal, execution of statements can continue normally.

breakEncountered
A break statement was encountered immediately or in a nested block. The current loop in effect is to be aborted and execution is to continue at the first statement following the loop statement.

continueEncountered
A continue statement was encountered immediately or in a nested block. The remainder of the current loop is to be skipped and execution is to continue with the loop condition if the loop is a while or do-while statement, or with the increment field if the loop is a for statement.

returnEncountered
A return statement was encountered immediately or in a nested block. No further statements are to be executed.


Private Member Functions

Value evaluate(const AstNode *node);
The evaluate() function is called to obtain the value of the expression the node represents. The function does this by calling itself recursively for each of its subnodes and then performing the operation indicated.

The evaluate function requires access to the Dataset object the interpreter is using. This is accessed as a member field. If evaluate() were written as an external function, rather than as a member of the Interpreter class, it would have to pass the data reference along in the calling sequence on every call to a subnode.

Note that execution of the evaluate() function is handled in a try-catch block. This allows errors reported by the Value class to be passed through the reportError() function which attaches the context information for the current node to the diagnostic message.

The evaluate() function is implemented as a switch statement controlled by the type of the node. For any node type not explicitly listed in the switch statement, the function returns an uninitialized Value object. Otherwise, evaluation is carried out as follows:

typeCommaExpression
The left operand is evaluated and the result is discarded. The right operand is evaluated and its value is returned as the value of the expression.

typeAssignment
The lvalue expression is evaluated, then the expression on the right side of the assignment. A switch statement is then used to determine the appropriate memory reference operation to perform. The return value is the result of this operation.

typeConditionalExpression
The condition expression is evaluated. If it evaluates to true, the onTrue expression is evaluated and its value returned. Otherwise, the onFalse expression is evaluated and its value returned. Notice that the expression that is not selected is not executed.

typeLogicalOrExpression
The LogicalOrExpression case uses "short cut" evaluation. That is, the left operand is evaluated. If it evaluates to true, its value is returned and the right operand is ignored. Otherwise the right operand is evaluated and its value is returned. Thus the right operand is evaluated only if the left operand is false.

typeLogicalAndExpression
The LogicalAndExpression case uses "short cut" evaluation. That is, the left operand is evaluated. If it evaluates to false, its value is returned and the right operand is ignored. Otherwise the right operand is evaluated and its value is returned. Thus the right operand is evaluated only if the left operand is true.

typeBinary
The Binary node includes all the binary operators used in C plus the exponentiation operator **. The left and right operands are evaluated, and then a switch statement is used to select the correct binary operation to perform.

typeUnary
The Unary node includes all of the data fetch operators plus all the unary operators used in C. The operand is evaluated, and then a switch statement is used to select the correct unary operation to perform.

typeConstant
Returns the value of the constant.

typeVariable
This node is used not only for a simple reference to a variable, but also for the autoincrement and autodecrement operations. Note that this node returns the value of the variable, not a reference, so that it can not be used as the destination of an assignment statement.

typeFunctionCall
The argument expressions are evaluated one by one and the results stacked. Then callFunction() is invoked which, in turn, identifies the actual function to be called and calls it. If no function with the given name and the given number of arguments is found in the function table, callFunction throws an InterpreterError exception.

Outcome execute(const AstNode *node);
The execute() function is called to execute a single statement. If the statement controls other statements, as is the case with conditional and loop statements, execute() calls itself recursively to execute the controlled statements as necessary.

The execute() function, like the evaluate() function, requires access to the Dataset object that contains the values of the variables the script is working on. Note that if the execute() function were implemented as a global function, rather than as a class member, dataset would have to be passed as a function parameter.

The return value of the execute() function is a member of the Outcome enumeration. The return value determines whether execution can continue with the next statement or whether a break, continue or return statement has been encountered.

The execute() function is implemented as a switch statement controlled by the type of the node. Evaluation is carried out as follows:

default
The default case handles any node type not explicitly listed in the switch statement. The execute() function is called recursively to execute each of the subnodes of the node in order, so long as the Outcome is normal. Otherwise, the outcome of the subnode statement is returned as the outcome of the node statement.

typeStatementBlock
This node is implemented as a list of statement nodes. The statements in the list are executed sequentially so long as the Outcome is normal. If the Outcome indicates a break, continue or return statement was encountered, no further statements in the list are executed and the appropriate Outcome is returned to indicate this fact to the calling function.

typeExpressionStatement
The evaluate() function is called to evaluate the expression. The result is discarded. Note that in the C/C++ tradition, the assignment that is the point of most expression statements is a "mere side effect" of evaluating the expression. The Outcome of an expression statement is always normal.

typeIfStatement
The condition expression is evaluated. If the condition is true, the statement is executed. The return value is the Outcome of the statement execution, since the statement could be a break, continue or return statement or it could be a block statement containing a break or continue statement. If the condition is false, nothing further is done and the return value is normal.

typeIfElseStatement
The condition is evaluated and tested. Depending on whether it is true or false the corresponding statement is executed and the resulting Outcome becomes the return value.

typeWhileStatement
The loop statement is executed repeatedly as long as the condition is true and no break or return statement is encountered.

If the loop terminates with a return statement, the Outcome is set to returnEncountered. Otherwise, the Outcome is specified as normal since break statements inside the loop do not affect any outer loop.

typeDoStatement
The loop statement is executed repeatedly until a break or return statement is encountered or the condition becomes false.

If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as normal since break statements inside the loop do not affect any outer loop.

typeRepeatStatement
The loop statement is executed repeatedly until a break or return statement is encountered or the condition becomes true. Note that the PLL syntax does not support the break statement.

If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as normal since break statements inside the loop do not affect any outer loop.

typeForStatement
The for statement is the most complicated single executable statement in C. It is characterized by three expressions and a statement which comprises the body of the loop. The expressions consist of an initialization expression, a condition expression, and an increment expression.

The first thing required in executing a for loop is to evaluate the initialization expression. Execution of the loop proper begins with evaluation of the condition expression. The condition expression is evaluated before execution of the statement, and the loop continues only if the condition returns true. At that point the statement is executed. If the statement Outcome is breakEncountered or returnEncountered then the entire for statement is terminated. Otherwise, the increment expression is evaluated, and the loop continues with the evaluation and testing of the condition expression.

If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as normal since break statements inside the loop do not affect any outer loop.

typeSimpleFor
This case handles the type of for loop defined in the PLL syntax. This syntax is noticeably simpler than the C-style for loop implemented by the typeForStatement case above.

The loop is executed in two stages. First, all the loop parameters are evaluated. Then the repetition count is calculated. The body of the loop is then executed the requisite number of times, with the index variable incremented after each execution. Note that this approach of calculating a repetition count is a much easier approach than testing the index variable after each execution of the body of the loop, since there are no complications associated with the sign of the increment.

Note that this implementation supports break statements even though they are not in the PLL syntax.

If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as normal since break statements inside the loop do not affect any outer loop.

typeBreak
The Outcome of the statement is set to breakEncountered.

typeContinue
The Outcome of the statement is set to continueEncountered.

typeDump
The Dump node contains a list of variables to be printed. The name and value of each variable in the list is printed on a separate line.

typePrint
The Print node contains a list of expressions to be printed. Each expression is evaluated and the Value::print() function called for each.

typeReturn
The return value, if any, is evaluated and stored in returnValue. The outcome of the statement is set to returnEncountered.

void updateDictionary(AstNode *node);
The updateDictionary function is a function which walks the entire tree looking for variable names. Every variable name found is added to the dictionary. This function represents a small degree of optimization. If it were not used, the data array would be resized each time a new variable was encountered in the course of the parse. By using this function, the data array can be sized adequately before the actual execution takes place.

Only one node type, VariableNode, actually contains a variable name. The switch statement contains a special cases for this type. Otherwise, there is one case for ListNode types and a default case. In both these cases, updateDictionary() is simply called recursively for each listed subnode.


Constructor/Destructor

Interpreter(char *text, Dataset &d);
The constructor invokes the buildTree() function, the external interface of the Abstract Syntax Tree parser, to create an abstract syntax tree from the text string. The constructor also records a reference to the Dataset object.

~Interpreter();
There is nothing for the destructor to do.


Public Member Function

Value execute();
When the public execute() function is invoked, the script has already been parsed and a reference to the root node of the tree stored in the tree object. execute() first calls updateDictionary() to walk the tree to find variable names and enter them in the dictionary. When the first data access is made to the dictionary, the data array will be resized to accommodate all the variables used by the script. Finally it calls the private execute() function on the root node of the tree to actually execute the script. When this function returns, returnValue is returned as the value of the function.



Table of Contents | Parsifal Software Home Page


XIDEK
Extensible Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.