Extending Interpreters:
Adding Structures and Objects
As one illustration of the methods involved in extending interpreters, a set of
interpreters has been created which adds structures and objects to the
features provided by the baseline interpreters. Source code for these
extended interpreters can be found in cll\struct,
pll\struct and support\struct.
It is perhaps misleading to speak of changes to the interpreters, since
adding support for structures and objects required only that code be
added to the interpreters. In no case was it necessary to modify
existing code. New rules have been added to the syntax. The same rules were
added to both the CLL and the PLL languages. New code to implement the new features
has been added to the support code.
It is easy to imagine many ways to implement structures and objects. The
technique used here is based more or less on the C language struct syntax
and is used to extend both the PLL and the CLL languages. It differs slightly from the model, since
neither of the example languages supports general type declarations.
The approach then is to use a simple C-like struct declaration statement in
the general form
struct <struct name> {<field name>...} <variable name>...;
where any one of the three components of the declaration may be omitted.
There are thus four different forms of the struct statement, e.g.:
struct widget {x, y, z} foo, bar;
-
This is the most general form of the structure definition. It defines a
structure with the name widget. Each instance of the structure
contains three fields, x, y, z. Two
instances, foo and bar, are also defined.
The structure definition is saved as the value of the variable
widget. The definition may be assigned to another variable:
newWidget = widget;
newWidget in turn names a structure and can be used in
definitions of structure instances.
struct widget {x, y, z};
-
This form of the structure statement simply saves the definition for
later use.
struct {x, y, z} foo, bar;
-
This form of the structure statement creates two structure instances,
foo and bar, each with three member fields, but
does not save the structure definition.
struct widget foo, bar;
-
This form of the structure statement defines two instance, foo
and bar, of the previously defined structure, widget.
In the extensions, the Value class is
extended so that the value of any variable can be a struct
definition or an instance of a struct. Such values may be assigned, so that
it is possible to have nested structures. For example, here is
an instance of a linked list:
struct Link{next, data} list; // define struct, instance
list.next = 0; // set next pointer to 0
struct Link link; // create new link
link.data = 17; // set data field
link.next = list.next; // set next field
list.next = link; // patch into list
The most obvious changes to support structures and objects are the changes to the syntax. The changes to the
supporting code are somewhat more extensive.
The first and largest set of changes to supporting code is to the common definitions module. These changes include the
definition of an Instance class, to
encapsulate objects, that is, specific instances of structures, and additions to the Value class. These modifications are
sufficient to provide support for the direct execution interpreters. They are also necessary
for all the other interpreter architectures.
The second set of changes, to the bytecode
interpreter module, added to the changes to the common definitions,
suffices to support the direct compilation interpreters.
These changes are also necessary for the astci interpreter.
The changes to the opcode enumeration are
also necessary for the astxi interpreter.
A small set of changes to the definitions of the
AstNode class is also required for
both the astxi and astci
interpreters.
Finally, a small set of changes to the
tree walk functions of the astxi and astci is required to add structures and objects to the
abstract syntax tree interpreters.
It should be noted that the modifications to the interpreter syntax files
did not require changes to any of the previously existing rules, reduction
procedures, or embedded C. All of the changes amount to
adding new tokens, rules, and support functions.
To add structure and object support to the syntax, it was necessary to add
two new tokens to the grammar:
name list
-
A name list is a comma delimited list of names. The name list token is
used both to specify a list of member field names in a structure and to
specify the variables which are to be assigned instances of the structure.
The name list token is identical to the name
list token defined in the Function
Definitions extension.
Note that the name list token could have been used to implement the dump statement.
(AgStack<AgString>) name list
-> name:n =AgStack<AgString>().push(n);
-> name list:stack, ',', name:n =stack.push(n);
The same implementation of the name list is used in dxi, dci and ast parsers. To make sure that
the AgStack
object is properly constructed on the parser stack, a wrapper statement has been added to the
configuration section.
It should be noted that the name list token is identical to the
like named token defined in the function
definitions extension.
structure declaration
-
The structure declaration is used both to define structures and to declare
instances of a given structure. The structure syntax is the
same as in C except that the member fields in the structures do not have
type specifications. The content of a structure is given simply by a comma
delimited list of field names. The member variable names are local to the
structure, so that identically named variables in different structures are
different variables.
structure declaration
-> "struct", name,'{', >name list, '}', name list
-> "struct", name, '{', name list, '}'
-> "struct", '{', name list, '}', name list
-> "struct", name, name list
The reduction procedures for these rules depend, of course, on the
interpreter architecture. They do not, however, depend on the choice of
language, CLL or PLL.
Here is how they are implemented:
- dxi
-
The local defineStructure()
and defineInstances()
functions are used to set the values of the specified structure and
instance variables.
- dci
-
The local codeStructDecl()
and codeInstanceDecl()
functions are used to create code fragments which will set the values
of the specified structure and instance variables.
- ast
-
Each of the four productions creates and returns a typeStructureDecl AstNode object.
In order to integrate the structure syntax with the rest of the grammar, two
additional rules have been added to the grammar. One defines a structure
definition statement:
simple statement
-> structure declaration, ';'
The second rule provides access to member fields of structures, exactly
as in C.
lvalue
-> lvalue, '.', name
Note that the definition is recursive, so that if a member field is itself
an instance of a structure, its members are also accessible. The reduction
procedure depends on the interpreter architecture, although not on the
language itself. Here is how they are implemented:
- dxi
-
The reduction procedure uses the getMember()
function to return an lvalue which points to the appropriate member field.
- dci
-
The reduction procedure creates a CodeFragment object consisting of a
MEMBER opcode and a parameter which is the index of
the member name in the fieldNames
dictionary.
- ast
-
The reduction procedure creates and returns a typeMember AstNode object.
One configuration statement has been added:
wrapper {AgStack<AgString>}
This provides proper C++ handling of the AgStack object on the parser value
stack.
The dci parser requires one new field in the parser control block.
To simplify reduction procedures, several new methods have been added to the
parser control block in the dxi and dci
parsers. No new methods were required for the ast parser.
Three new methods have been added to the parser control block to implement
structures and objects:
void defineStructure(const AgString &structName,
const AgStack<AgString> &fieldNames);
-
This function defines a structure with the specified member fields and
assigns the value to the variable structName. Any previous
value of structName will be lost.
void defineInstances(const AgString &structName,
const AgStack<AgString> &instanceNames);
-
This function defines instances of the specified
structure. If any of the named variables has been previously used, its
previous value is lost.
void defineInstances(const AgStack<AgString> &fieldNames,
const AgStack<AgString> &instanceNames);
-
This function defines instances of the specified
anonymous structure. If any of the named variables has been previously
used, its previous value is lost.
One new data field and five new methods have been added to the
parser control block to implement structures and objects:
-
AgDictionary<AgString>
fieldNames;
-
-
CodeFragment codeMember(const AgString &);
-
This function generates code to identify the named member of an instance
of a structure. The code will leave an lvalue on the runtime stack.
It is assumed that code will have previously been generated to push
an lvalue identifying an instance.
-
CodeFragment codeStructDecl(const AgString &name,
const AgStack<AgString> &fields,
const AgStack<AgString> &instanceNames);
-
This function generates code to create a structure definition with the
given fields and store it in the variable given by name. Instances of
this structure are then created and stored in the variables given by
the instanceNames list.
-
CodeFragment codeStructDecl(const AgString &name,
const AgStack<AgString> &fields);
-
This function generates code to create a structure definition with the
given fields and store it in the variable given by name. No
instances are created.
-
CodeFragment codeInstanceDecl(const AgStack<AgString> &name,
const AgStack<AgString> &instanceNames);
-
This function generates code to create instances of the structure
identified by name and store them in the variables listed by
instanceNames .
-
CodeFragment codeInstanceDecl(const AgString &fields,
const AgStack<AgString> &instanceNames);
-
This function generates code to create a structure definition with the
given fields and then create instances of it which are
stored in the variables given by the instanceNames list.
The Instance class serves to encapsulate an instance of a struct defined
using the struct statement. It consists of a constant
AgDictionary<AgString>
object which names the member fields of an object and an
AgArray<Value>
object which contains the values that correspond to the names in the dictionary.
const AgDictionary<AgString> dictionary;
-
Contains the names of the member fields of the struct of which this object
is an instance. Note that the dictionary is declared to be constant so that
new entries cannot be made once the object has been constructed. Note also
that if several instances of a single struct are created, all will share
the same actual dictionary.
AgArray<Value> data;
-
This array holds the data corresponding to the member field names in
dictionary. The values are
initialized to uninitType.
Instance(const AgDictionary<AgString> &);
-
The primary constructor takes a reference to a dictionary which defines
the member fields of the struct of which this object is an instance.
Instance(const Instance &);
-
The copy constructor exists simply for completeness.
~Instance();
-
There is nothing for the destructor to do. The dictionary and the data
array both have destructors which will take care of freeing storage when
it becomes necessary.
Value &getMember(const AgString &n);
-
Returns a reference to the value of specified field. The value can
be either read or written. If there is no field with the given name,
the function throws an ErrorMessage exception.
AgString asString() const;
-
Creates and returns a string representation of the object.
int hash(int startValue = 0);
-
This function calculates a hash code for the instance using the
hash functions for the dictionary and the array. Creating a hash
code allows the use of Instance objects in
AgSet,
AgMap and
AgDictionary
containers.
int operator == (const Instance &);
int operator != (const Instance &);
-
Two instances of a struct are equal if they have the same member
variables, in the same order, and the variables have the same
values. The equality operator supplements the hash function to
allow use of Instance objects in
AgSet,
AgMap and
AgDictionary
containers.
The changes to the Value class required to support structures
and objects are as follows:
-
New types, structureType and
instanceType have been added to the
Type enumeration. Structures
are represented by an AgDictionary<AgString> object, a
dictionary of strings, which name the member fields of the
structure. Instances are represented by Instance class objects.
-
Since both the AgDictionary and the Instance class objects have
constructors and thus require the use of the ValueWrapper class,
entries have been added to the anonymous
union to guarantee adequate space for the
AgDictionary<AgString> object
and the Instance object:
char dictionarySpace
[sizeof(ValueWrapper<AgDictionary<AgString> >)];
char objectSpace[sizeof(ValueWrapper<Object>)];
It is good practice to include such a sizing entry for each class that
uses the ValueWrapper class, since
not all classes can be counted upon to fit within a double.
-
A constructor has been added to the
Value class that will create a structureType Value object
containing a given AgDictionary<AgString>.
-
A constructor has been added to the
Value class that will create an instanceType Value object containing
a given instance of the Instance class.
-
structureType and
instanceType cases have been added
to the switch statements in the following functions (see Added Cases):
-
In addition to the AgDictionary and Instance constructors,
the following new functions have been added to the Value class to support
the above modifications (see Added
Functions):
-
String constants for error messages corresponding to new error
modes have been added:
- Defining a struct instance using a variable that is not a struct.
- Accessing a member field of a variable which is not an object
- Accessing an undefined member field.
Value &setValue(const Value &v);
-
In the setValue() function, used by constructors and the assignment
operator, cases corresponding to the
structureType and the
instanceType
have been added. Both use the placement new operator of the ValueWrapper class to store the
appropriate object in the space provided by the anonymous union.
~Value();
-
In the destructor, the wrappers for the AgDictionary object and the
Instance object must be deleted as necessary.
Since both of these objects are reference counted, the actual data will
only be deleted if this is the last reference to it.
int hash(int) const;
-
The structureType case passes
along the AgDictionary hash value, the instanceType passes along the
Instance hash value.
AgString asString() const;
-
The structureType and
instanceTypecases are added to
make appropriate AgString representations of the structure definition
or instance values.
- Value &operator = (const Value &v);
-
Cases have been added to delete the ValueWrapper
containing the AgDictionary or Instance object.
Deleting the old value is exactly the same as in the destructor.
int operator == (const Value &v) const;
-
The structureType case determines
that two structures are equal if and only if the dictionaries are equal,
that is, the structures have like named fields in the same order.
The instanceType case is simply
passed through to the Instance class equality operator.
Value(const AgDictionary<AgString> &);
-
This constructor creates a structureType Value
object.
Value(const Instance &);
-
This constructor creates an instanceType Value
object.
void assertStructure() const ;
-
Throws an ErrorMessage exception
if the Value object is not a structure.
void assertInstance() const ;
-
Throws an ErrorMessage exception
if the Value object is not an instance of a structure.
const AgDictionary<AgString> &getDictionary() const;
-
If the Value object is a structure, this function returns a reference to
the AgDictionary object that defines it. Otherwise, the function
throws an ErrorMessage exception.
Instance &getInstance() const;
-
If the Value object is an instance of a structure, this function returns
a reference to the Instance object that defines it. Otherwise, the
function throws an ErrorMessage
exception.
Value &getMember(const AgString &n) const ;
-
If the Value object is an instance of a structure, this function returns
a reference to the value of the named member field, if it exists.
Otherwise, the function throws an ErrorMessage exception.
Extending the bytecode interpreter, whether for
the CLL or the PLL language, to
handle structures and objects requires the following changes:
-
Two new opcodes, INSTANCE and MEMBER, have been added to the
opcode enumeration.
-
A constructor has been added to the Constant class, to correspond
to the structureType
constructor added to the Value class. Such structureType
constants are used to initialize structure variables. No corresponding
instanceType constructor is
required, since there is no mechanism in the language to specify a
constant instance of a structure.
-
An array of strings, fieldNames, has
been added to the ScriptMethod class, to provide for identifying member
fields of structures. The parameter to the MEMBER opcode has to be an
integer, preferably a small integer. During parsing, a string dictionary
is used to identify field names used in structures. Since only the
mapping from indices to names is needed at runtime, the contents of the
parse time dictionary are used to initialize the array used by the
ScriptMethod class.
It would be possible to identify field names in the main dataset
dictionary, but this would have the disadvantage of adding all member
field names to the global name space. Note that the index in the
fieldNames dictionary is not used to index storage. Instead,
it is simply used by the MEMBER instruction to locate the name of the
member field, which is then looked up in the dictionary that belongs to
the structure.
-
Ascii representations of the INSTANCE and MEMBER opcodes have been added
to the opcodeText[] array.
-
An initializer for fieldNames has been added
to the ScriptMethod constructors.
-
Addition of a MEMBER case to the ScriptMethod::list() function, in order
to handle the field name correctly. No new case is required for INSTANCE
because it does not need any special treatment.
-
Addition of INSTANCE and MEMBER cases to the ScriptMethod::apply() function:
- INSTANCE
-
The INSTANCE instruction takes a structure description from the top element of the
stack and replaces the top of the stack with an Instance object, all of whose member fields are set to
uninitType.
- MEMBER
-
The operand of the MEMBER instruction is an index into the fieldNames array. The MEMBER
instruction pops an instance of a struct from the top of the stack,
retrieves the name of the field from the fieldNames array,
extracts the named member field from the object and pushes the value of
the member field back onto the stack. If the top of the stack is not an
instance of a struct or the fieldNames index does not
identify a field with in the struct, an ErrorMessage exception is thrown.
Extending the Abstract Syntax Tree definitions, whether for
the CLL or the PLL language, to
handle structures and objects requires the following changes:
-
Add new node types to the NodeType
enumeration to support the following types of nodes:
- typeStructureDecl
-
This node type represents a structure declaration.
- typeMember
-
This node type represents a member.
- typeName
-
This node type is used to encapsulate a name for use in a
typeMember node. It might appear at first glance that a
typeVariable node might be sufficient, but to do so would add each such
name to the global name space. It is actually implemented as a
class, NameNode, derived from AstNode.
- typeNameList
-
Identifies a node that contains a list of names. It is implemented
as a class, NameListNode, derived from AstNode.
-
Add structs, subnodesStructureDecl and
subnodesMember, to the anonymous
union of the AstNode class to allow access to the subnodes by name.
-
Define specialized node classes, NameNode and NameListNode, derived from
AstNode to encapsulate the
typeName and typeNameList nodes. These classes are required since these
are leaf nodes of the syntax tree.
-
Add subnode counts for the newly defined node types to the nSubnodes[] array.
-
Implement constructors for the newly defined node classes, NameNode and
NameListNode.
-
Define wrapper functions,
nameNode() and nameListNode(), to simplify
creation of typeName and typeNameList nodes.
The changes to the two abstract syntax tree interpreters, astxi and astci, are essentially
trivial.
The only changes to astxi are the addition of a typeStructDecl
case to the switch statement in Interpreter::execute() and
the addition of a typeMember case to the switch statement in
Interpreter::evaluate(). Of course,
this interpreter also depends on the changes made to the common definitions,
the changes to the opcodes.h enumeration, and the changes to the abstract
syntax tree parser.
The changes to astci are only slightly more complex:
-
A string dictionary, fieldNames, has
been added to the Compiler class.
The dictionary is used to identify member fields of structures. It is
passed on to the ScriptMethod class for use by the bytecode interpreter.
-
Two new cases have been added to the switch statement in Compiler::compile():
- typeStructureDecl
-
The one case is adequate to handle all four productions of the
structure declaration. It begins by identifying the fieldNames and
instanceNames lists. It then declares an empty CodeFragment
object, structureDef, which will be set to contain code that
fetches an appropriate structure definition to the top of the runtime stack.
If fieldNames is not empty,
a Constant is created which contains a dictionary that identifies the
field names. Code is added to structureDef to fetch the
value of this constant.
If a structure name has been provided, then the appropriate behavior
depends on whether there was a list of field names. If so, code is
generated to save the structure definition in the appropriate variable.
Otherwise, code is added to structureDef to fetch the
structure definition from the named variable.
Note that the syntax of the structure declaration is such that if the
list of field names is empty, there must be a structure name.
Finally, for each name in the instanceNames list, code is
generated to fetch the structure definition, using the code in
structDef, and an Instance
object is created and stored in the appropriate variable.
- typeMember
-
This case identifies the name of the member field in fieldNames and generates a
MEMBER instruction.
-
An initializer for fieldNames has
been added to the ScriptMethod constructor.
The value of fieldNames is then copied from the Compiler object.
|