astxi: Tree Execution Interpreters
The "Abstract Syntax Tree Execution" interpreter, support\base\astxi.cpp creates an abstract
syntax tree by using an Abstract Syntax Tree Parser,
then executes the statements in a script by traversing the tree. Two versions
of the astxi interpreter can be built by using either CLL syntax or PLL syntax. This is a
somewhat more complicated method, in terms of support code, than used by
the Direct Execution Interpreter, but it has the
distinct advantage of substantially greater simplicity in the execution of
loops.
If a single script must be executed several times, by using
this approach it is possible to arrange to parse the script only once,
thus avoiding parsing overhead on subsequent executions. Building an
abstract syntax tree is noticeably simpler than creating a bytecode interpreter,
as is done in the Direct Compilation Interpreter, and
provides some of the same performance benefits.
The astxi interpreter is encapsulated in the Interpreter class. The interpreter
interface function,
interpret(), is implemented by creating an
Interpreter object and then invoking the object's public member
function, execute().
The Interpreter class encapsulates the functions and the data necessary
to traverse an abstract syntax tree and execute the statements it describes.
Strictly speaking, this class is unnecessary. The functions in the class
could have been written as external functions. They would, however, have
had to have another argument in order to pass the reference to the Dataset object. Should any additional parameters become necessary, it would be
necessary to change all the function calls to accommodate them. By making
the functions members of a class and defining these parameters as class
members, the calling sequences are simplified and are unlikely to require
changes to accommodate program modifications.
AbstractSyntaxTree
tree;
-
The tree is created by a call to buildTree(),
the external interface of the Abstract Syntax Tree parser,
at construction time.
Dataset &dataset;
-
This field provides a reference to the Dataset object the interpreter
is to operate on. It is set by the constructor.
Value returnValue;
-
This field is used as a temporary location to store the return value from the
script. If a return statement with an argument expression is encountered,
the expression is evaluated and the result is stored here.
Outcome
-
The Outcome enumeration is used to encode the result
of executing a statement. The result indicates whether the statement
executed to completion or whether it encountered a break or continue
statement within a nested statement that was not part of a loop.
The enumeration identifies four cases:
normal
-
The statement executed was neither a break, continue, nor return statement,
nor was a return statement encountered in any nested statement, nor
was a break or continue statement encountered
in a nested statement that was not itself inside a nested loop. Note
that a break or continue statement affects only the innermost loop that
contains it. A return statement, on the other hand, causes all loops
that contain it to terminate.
If the outcome of executing a statement is normal, execution of
statements can continue normally.
breakEncountered
-
A break statement was encountered immediately or in a nested
block. The current loop in effect is to be aborted and execution is
to continue at the first statement following the loop statement.
continueEncountered
-
A continue statement was encountered immediately or in a nested block.
The remainder of the current loop is to be skipped and execution is to
continue with the loop condition if the loop is a while or do-while
statement, or with the increment field if the loop is a for statement.
returnEncountered
-
A return statement was encountered immediately or in a nested
block. No further statements are to be executed.
Value evaluate(const AstNode *node);
-
The evaluate() function is called to obtain the value of the
expression the node represents. The function does this by calling itself
recursively for each of its subnodes and then performing the operation
indicated.
The evaluate function requires access to the Dataset object
the interpreter is using. This is accessed as a member field. If evaluate()
were written as an external function, rather than as a member of the
Interpreter class, it would have to pass the data reference
along in the calling sequence on every call to a subnode.
Note that execution of the evaluate() function is handled
in a try-catch block. This allows errors reported by the
Value class to be passed through
the reportError()
function which attaches the context information for the
current node to the diagnostic message.
The evaluate() function is implemented as a switch statement
controlled by the type of the node. For any node type not explicitly
listed in the switch statement, the function returns an uninitialized
Value object. Otherwise, evaluation
is carried out as follows:
typeCommaExpression
-
The left operand is evaluated and the result is discarded. The
right operand is evaluated and its value is returned as the value of
the expression.
typeAssignment
-
The lvalue expression is evaluated, then the expression on the right
side of the assignment. A switch statement is then used to determine
the appropriate memory reference operation to perform.
The return value is the result of this operation.
typeConditionalExpression
-
The condition expression is evaluated. If it evaluates to true,
the onTrue expression is evaluated and its value returned. Otherwise,
the onFalse expression is evaluated and its value returned. Notice that
the expression that is not selected is not executed.
typeLogicalOrExpression
-
The LogicalOrExpression case uses "short cut"
evaluation. That is, the left operand is evaluated. If it evaluates to
true, its value is returned and the right operand is ignored. Otherwise
the right operand is evaluated and its value is returned. Thus the right
operand is evaluated only if the left operand is false.
typeLogicalAndExpression
-
The LogicalAndExpression case uses "short cut" evaluation.
That is, the left operand is evaluated. If it evaluates to false, its
value is returned and the right operand is ignored. Otherwise the right
operand is evaluated and its value is returned. Thus the right operand is
evaluated only if the left operand is true.
typeBinary
-
The Binary node includes all the binary operators used in C
plus the exponentiation operator **. The left and right operands are
evaluated, and then a switch statement is used to select the correct
binary operation to perform.
typeUnary
-
The Unary node includes all of the data fetch operators
plus all the unary operators used in C.
The operand is
evaluated, and then a switch statement is used to select the correct
unary operation to perform.
typeConstant
-
Returns the value of the constant.
typeVariable
-
This node is used not only for a simple reference to a variable, but
also for the autoincrement and autodecrement operations. Note that this
node returns the value of the variable, not a reference, so that it can
not be used as the destination of an assignment statement.
typeFunctionCall
-
The argument expressions are evaluated one by one and the results
stacked. Then
callFunction()
is invoked which, in turn, identifies the actual function to be called
and calls it. If no function with the given name and the given number of
arguments is found in the function
table, callFunction throws an
InterpreterError exception.
Outcome execute(const AstNode *node);
-
The execute() function is called to execute a single statement.
If the statement controls other statements, as is
the case with conditional and loop statements, execute() calls
itself recursively to execute the controlled statements as necessary.
The execute() function, like the evaluate() function, requires access to
the Dataset object that contains
the values of the variables the script is working on. Note that
if the execute() function were implemented as a global
function, rather than as a class member,
dataset would have to be passed as a function
parameter.
The return value of the execute() function is a member of the
Outcome enumeration. The return value
determines whether execution can continue with the next statement or whether
a break, continue or return statement has been encountered.
The execute() function is implemented as a switch statement
controlled by the type of the node. Evaluation is carried out as follows:
default
-
The default case handles any node type not explicitly listed in the
switch statement. The execute() function is called recursively
to execute each of the subnodes of the node in order, so long as the Outcome is normal. Otherwise, the outcome of the
subnode statement is returned as the outcome of the node statement.
typeStatementBlock
-
This node is implemented as a list of statement nodes.
The statements in the list are executed sequentially so long as the
Outcome is
normal. If the Outcome indicates a
break, continue or return statement was encountered, no further
statements in the list are executed and the appropriate
Outcome is returned to indicate this fact to the calling
function.
typeExpressionStatement
-
The evaluate() function is called
to evaluate the expression. The result is discarded. Note that in the
C/C++ tradition, the assignment that is the point of most expression
statements is a "mere side effect" of evaluating the expression. The
Outcome of an expression statement is always
normal.
typeIfStatement
-
The condition expression is evaluated. If the condition is true,
the statement is executed. The return value is the Outcome of the statement execution, since
the statement could be a break, continue or return statement or it could be a
block statement containing a break or continue statement. If the
condition is false, nothing further is done and the return value is normal.
typeIfElseStatement
-
The condition is evaluated and tested. Depending on whether it is true
or false the corresponding statement is executed and the resulting Outcome becomes the return value.
typeWhileStatement
-
The loop statement is executed repeatedly as long as the condition is
true and no break or return statement is encountered.
If the loop terminates with a return statement, the Outcome is set to returnEncountered. Otherwise, the Outcome is specified as normal since break statements inside the
loop do not affect any outer loop.
typeDoStatement
-
The loop statement is executed repeatedly
until a break or return statement is encountered or the condition becomes false.
If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as
normal since break statements inside the loop do not affect
any outer loop.
typeRepeatStatement
-
The loop statement is executed repeatedly
until a break or return statement is encountered or the condition
becomes true. Note that the PLL syntax does not
support the break statement.
If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as
normal since break statements inside the loop do not affect
any outer loop.
typeForStatement
-
The for statement is the most complicated single executable
statement in C. It is characterized by three expressions and a
statement which comprises the body of the loop. The expressions consist
of an initialization expression, a condition expression, and an increment expression.
The first thing required in executing a for loop is to evaluate the initialization
expression. Execution of the loop proper begins with evaluation of the condition
expression. The condition
expression is evaluated before execution of the statement, and
the loop continues only if the condition returns true. At that point
the statement is executed. If the statement Outcome is breakEncountered or returnEncountered then the entire for
statement is terminated. Otherwise, the increment expression is
evaluated, and the loop continues with the evaluation and testing of
the condition expression.
If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as
normal since break statements inside the loop do not affect
any outer loop.
typeSimpleFor
-
This case handles the type of for loop defined in the PLL
syntax. This syntax is noticeably simpler than the C-style for loop
implemented by the typeForStatement case above.
The loop is executed in two stages. First, all the loop parameters are
evaluated. Then the repetition count is calculated. The body of the
loop is then executed the requisite number of times, with the index
variable incremented after each execution. Note that this approach
of calculating a repetition count is a much easier approach than
testing the index variable after each execution of the body of the
loop, since there are no complications associated with the sign of the
increment.
Note that this implementation supports break statements even though
they are not in the PLL syntax.
If the loop terminates with a return statement, the Outcome is returnEncountered. Otherwise, the Outcome is specified as
normal since break statements inside the loop do not affect
any outer loop.
typeBreak
-
The Outcome of the statement is set to
breakEncountered.
typeContinue
-
The Outcome of the statement is set to
continueEncountered.
typeDump
-
The Dump node contains a list of variables to be printed. The
name and value of each variable in the list is printed on a
separate line.
typePrint
-
The Print node contains a list of expressions to be printed.
Each expression is evaluated and the
Value::print()
function called for each.
typeReturn
-
The return value, if any, is evaluated and stored in returnValue.
The outcome of the statement is set to
returnEncountered.
void updateDictionary(AstNode *node);
-
The updateDictionary function is a function which walks the
entire tree looking for variable names. Every variable name found is
added to the dictionary. This function represents a small degree of
optimization. If it were not used, the data array would be resized each
time a new variable was encountered in the course of the parse. By using
this function, the data array can be sized adequately before the actual
execution takes place.
Only one node type, VariableNode,
actually contains a variable name. The switch statement contains a special
cases for this type. Otherwise, there is one case for
ListNode types and a default case.
In both these cases, updateDictionary() is simply called
recursively for each listed subnode.
Interpreter(char *text, Dataset &d);
-
The constructor invokes the buildTree() function,
the external interface of the Abstract Syntax Tree parser,
to create an abstract syntax tree from the text string. The constructor
also records a reference to the Dataset object.
~Interpreter();
-
There is nothing for the destructor to do.
Value execute();
-
When the public execute() function is invoked, the script
has already been parsed and a reference to the root node of the tree
stored in the tree object. execute() first calls
updateDictionary() to walk
the tree to find variable names and enter them in the dictionary. When the
first data access is made to the dictionary, the data array will be
resized to accommodate all the variables used by the script. Finally it
calls the private execute()
function on the root node of the tree to actually execute the script.
When this function returns, returnValue is
returned as the value of the function.
|