Parsifal Software


XIDEK
Extensible Interpreter Development Kit
Reference documentation


astci: Tree Compilation Interpreters

Introduction

The "Abstract Syntax Tree Compilation" interpreter, implemented in support\base\astci.cpp creates an abstract syntax tree using the Abstract Syntax Tree Parser and then compiles the statements in a script into bytecode by traversing the tree. Two versions of the astci interpreter can be built by using either CLL syntax or PLL syntax. This is a somewhat more complicated method, in terms of support code, than used by the Direct Compilation Interpreter, but it has the advantage of allowing multiple traversals of the tree and therefore more sophisticated code generation.

In this example, no attempt has been made to generate code for a more sophisticated virtual machine. Instead, to illustrate the facility obtained by using an abstract syntax tree, a small amount of code optimization has been performed by adding the capability of evaluating constant expressions at compile time.

The astci interpreter is implemented in terms of two classes: the Compiler class which creates bytecode from an AbstractSyntaxTree object, and the ScriptMethod class which interprets the bytecode.


ScriptMethod::ScriptMethod(const char *text, AgDictionary<AgString> &d);

Although other methods of the ScriptMethod class are defined in the bcidefs module, the constructor, which actually compiles the input script, is defined as part of the astci module.

The constructor is implemented in two stages. In the first stage, a Compiler object is created. The constructor for the Compiler class invokes the ast parser to create an abstract syntax tree which summarizes the script. The compile() method of the Compiler class is then invoked to traverse the tree and generate bytecode.


External Interface

The external interface for the interpreter begins by constructing a ScriptMethod object. The constructor actually parses the script and compiles the byte code. interpret() then invokes the ScriptMethod::apply() function to execute the bytecode. A call to the ScriptMethod::list() function, commented out for regression testing, may be used to create a listing of the generated bytecode.


Optimization

Compiling from an abstract syntax tree provides substantially greater flexibility than compiling directly at parse time, since it is possible to traverse the tree as many times as is desired. To illustrate this capability, the compiler has optional optimization code which evaluates constant expressions at compile time and simplifies the generated code accordingly. Optimization is controlled by the definition of the macro OPTIMIZE. For convenience, this macro has been predefined in astci.h. To turn off optimization, comment out the macro definition.

When the OPTIMIZE macro is defined, the doConstantArithmetic() function is defined and called as the first step in the compile() function. The function works by replacing any portion of the abstract syntax tree that represents a constant expression with a single ConstantNode. In addition, the typeIfStatement, typeIfElseStatement, and typeWhileStatement cases of the compile() function check for a constant condition value and take appropriate action.


The Compiler Class

Introduction

The Compiler class encapsulates the functions and the data necessary to traverse an abstract syntax tree and compile the statements it describes into bytecode.

Strictly speaking, this class is unnecessary. The functions in the class could have been written as external functions. They would, however, have had to have two extra arguments in order to pass the references to the dictionary and the constants. Should any additional parameters become necessary, it would be necessary to change all the function calls to accommodate them. Making the functions members of a class and defining these parameters as class members allows the calling sequences to be simplified and thereby simplifies program maintenance.


Private Member Fields

AbstractSyntaxTree tree;
The tree is created by a call to buildTree(), the external interface of the Abstract Syntax Tree parser, at construction time.

AgStringDictionary<AgString> &dictionary;
A reference to the dictionary used to identify variable names is initialized by the constructor. Notice that the dictionary must be externally defined. The bytecode created by this compiler can be used on any Dataset object that contains a reference to the same dictionary.

AgDictionary<Constant> constants;
The constants dictionary is used to identify and accumulate constants that are encountered in the script. Rather than complicate the bytecode interpreter by putting such constants inline as immediate constants, they are referenced in the bytecode by their index in the constants dictionary. A dictionary is used rather than a stack in order to guarantee that there is only one instance of any given constant, no matter how many times it appears in a script.


Private Member Functions

CodeFragment compile(const AstNode *node);
The compile function is called to generate bytecode for the specified node. It returns a CodeFragment object. If the node is not a leaf node, the compile() function calls itself recursively to compile its subnodes. It then combines these code fragments with specific glue code to implement the indicated functionality.

The function is implemented as a switch statement controlled by the type of the node. The treatment of specific nodes is as follows:

default
The default case handles all nodes not explicitly listed below. The compile() function is called recursively for each subnode. The resulting CodeFragment objects are simply concatenated and the concatenation is returned.

typeStatementBlock
The compile() function is called recursively for each node in the list. The resulting CodeFragment objects are simply concatenated and the concatenation is returned.

typeVariable
The name of the variable is entered into the dictionary if necessary and the dictionary index obtained. A CodeFragment is then returned which consists of a locate instruction and the dictionary index of the variable name.

typeDump
For each variable name in the list, the variable is identified in the dictionary and a CodeFragment containing a DUMP instruction is generated. If the variable is not in the dictionary, it is added to the dictionary and its value is set to uninitType. The code fragments are concatenated together and the final result is returned.

typePrint
Each expression in the list is compiled and a PRINT instruction appended. The code fragments are concatenated together and the final result is returned.

typeReturn
If an expression provided in the return statement, it is compiled and a RETURN instruction is appended. Otherwise, a simple HLT instruction is coded.

typeConstant
The constant value is identified in the constants dictionary. The dictionary index of the constant is the argument to the PUSHC instruction.

typeUnary
The code to compute the operand is compiled by a recursive call to compile(). The unary operator is then appended to the code thus created.

typeBinary
The code to compute the left operand is compiled by a recursive call to compile(). The code to compute the right operand is then created by a second recursive call and concatenated with the previously generated code. Finally, the binary operator itself is appended.

typeCommaExpression
The code to compute the left operand is compiled by a recursive call to compile(). A POP instruction is appended to discard the result of the left operand. The code to compute the right operand is then created by a second recursive call and concatenated with the previously generated code.

typeExpressionStatement
The code to compute the expression is compiled by a recursive call to compile(). A POP instruction is appended to discard the result.

typeAssignment
First, the code to evaluate the lvalue expression on the left side of the assignment is compiled by a recursive call to compile. Then the code to compute the expression on the right side of the assignment is compiled by another recursive call to compile() and concatenated with the lvalue code. Finally, the assignment opcode is appended.

typeConditionalExpression
Since there is no difference in the code generation between a conditional expression and an if-else statement, the condition and the two expression nodes are separately compiled and the ifElse() method of the CodeFragment class is invoked to finish the job.

typeLogicalOrExpression
Code for the right expression is compiled first in order to determine its size. Then code for the left expression is compiled. A LOR instruction is appended. The offset for the branch is the size of the code for the right expression. Finally, the code for the right expression is concatenated.

typeLogicalAndExpression
Code for the right expression is compiled first in order to determine its size. Then code for the left expression is compiled. A LAND instruction is appended. The offset for the branch is the size of the code for the right expression. Finally, the code for the right expression is concatenated.

typeIfElseStatement
If the OPTIMIZE flag is set and the condition expression has typeConstant, the value of the constant is interrogated and used to select one of the two subnodes. Thus, if the OPTIMIZE flag is set, the normal if-else statement can be used to control conditional compilation.

If the OPTIMIZE flag is not set or the loop condition is not constant, code for the three subnodes, the condition, the onTrue statement, and the onFalse statement, is compiled independently and passed to the CodeFragment::ifElse() function which combines these code fragments with the appropriate branch instructions to create a single code fragment.

typeIfStatement
If the OPTIMIZE flag is set and the condition expression has typeConstant, the value of the constant is interrogated. If it is false, an empty CodeFragment is returned. Otherwise the CodeFragment for the statement controlled by the if is returned. Thus, if the OPTIMIZE flag is set the normal if statement can be used to control conditional compilation.

If the OPTIMIZE flag is not set or the loop condition is not constant, code for the two subnodes, the condition and the statement controlled by the if, is compiled independently and passed to the CodeFragment::ifStatement() function which combines these code fragments with the appropriate branch instructions to create a single code fragment.

typeWhileStatement
If the OPTIMIZE flag is set and the condition expression has typeConstant, the value of the constant is interrogated. If it is false, an empty CodeFragment is returned. Otherwise the CodeFragment for the statement is checked for the presence of a break statement. If none is found, an error is reported. Otherwise, a continue statement is appended to the code, patchBranches() is invoked to substitute correct exit and continue branch offsets, and the code fragment is returned.

If the OPTIMIZE flag is not set, code for the two subnodes, the condition and the statement controlled by the while, is compiled independently and passed to the CodeFragment::whileLoop() function which combines these code fragments with the appropriate branch instructions to create a single code fragment.

typeDoStatement
The statement and the condition are separately compiled and then handed over to CodeFragment::doLoop to add the necessary branch instructions.

typeForStatement
The four different components of the for statement are separately compiled by recursive calls to compile(). The results are then handed over to CodeFragment::forLoop().

typeBreak
The function appendBreak() is called to append a branch statement to an empty CodeFragment object. Eventually, after this bit of code is concatenated with others, the patchBranches() function will fill in the operand field of the instruction with the correct offset to exit the loop.

Note that the parser will throw an exception if a break statement occurs outside a loop.

typeContinue
The function appendContinue() is called to append a branch statement to an empty CodeFragment object. Eventually, after this bit of code is concatenated with others, the patchBranches() function will fill in the operand field of the instruction with the correct offset to continue the loop.

Note that the parser will throw an exception if a continue statement occurs outside a loop.

typeFunctionCall
The argument expressions are compiled in turn and the code concatenated. Then the function table index of the function to be called is determined by calling idFunction(). Finally a CALL instruction is appended with the function table index as the parameter.

void doConstantArithmetic(const AstNode *node);
The doConstantArithmetic() function is incorporated in the compiler only if the OPTIMIZE switch is set. It is then called as the first statement of the compile() function.

On entry, doConstantArithmetic() calls itself recursively to handle constant arithmetic on all the subnodes of the current node. Then, the function uses a switch statement to support special handling for specific node types:

ListNode Nodes
A recursive call is made to handle each node in the list.

typeUnary
If the operand has a constant value, the value is fetched and the indicated operation is performed. The value of the constant node is set to the resultant value and the context is set to the context of the unary node. Finally, the unary node is replaced with the constant node.

There is no need to delete the now abandoned unary node since it was registered in the nodeStack belonging to the abstract syntax tree when it was created and will be properly deleted when the tree itself is deleted.

typeBinary
If both operands are constant, the indicated operation is performed leaving the result in the left operand node. The binary node is then replaced with the left operand node.

There is no need to delete the now abandoned binary node, since it was registered in the nodeStack belonging to the abstract syntax tree when it was created and will be properly deleted when the tree itself is deleted.


Constructor/Destructor

Compiler(char *text, AgDictionary<AgString> &d);
The constructor invokes the buildTree() function to create an abstract syntax tree from the text string. It also records a reference to the dictionary.

~Compiler();
There is nothing for the destructor to do.


Public Member Function

ScriptMethod compile();
If the OPTIMIZE switch is set, compile() begins by calling doConstantArithmetic() on the root node of the tree. Then it calls the private compile() function on the root node of the tree, and appends a HLT instruction to the code. Finally it returns a ScriptMethod object, constructed from the generated code, the dictionary and the constantList.



Table of Contents | Parsifal Software Home Page


XIDEK
Extensible Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.