Parsifal Software


XIDEK
Extensible Interpreter Development Kit
Reference documentation


Extending Interpreters:
Adding Structures and Objects

Introduction

As one illustration of the methods involved in extending interpreters, a set of interpreters has been created which adds structures and objects to the features provided by the baseline interpreters. Source code for these extended interpreters can be found in cll\struct, pll\struct and support\struct.

It is perhaps misleading to speak of changes to the interpreters, since adding support for structures and objects required only that code be added to the interpreters. In no case was it necessary to modify existing code. New rules have been added to the syntax. The same rules were added to both the CLL and the PLL languages. New code to implement the new features has been added to the support code.

It is easy to imagine many ways to implement structures and objects. The technique used here is based more or less on the C language struct syntax and is used to extend both the PLL and the CLL languages. It differs slightly from the model, since neither of the example languages supports general type declarations.

The approach then is to use a simple C-like struct declaration statement in the general form

  struct <struct name> {<field name>...} <variable name>...;
where any one of the three components of the declaration may be omitted. There are thus four different forms of the struct statement, e.g.:
struct widget {x, y, z} foo, bar;
This is the most general form of the structure definition. It defines a structure with the name widget. Each instance of the structure contains three fields, x, y, z. Two instances, foo and bar, are also defined.

The structure definition is saved as the value of the variable widget. The definition may be assigned to another variable:

newWidget = widget;
newWidget in turn names a structure and can be used in definitions of structure instances.

struct widget {x, y, z};
This form of the structure statement simply saves the definition for later use.

struct {x, y, z} foo, bar;
This form of the structure statement creates two structure instances, foo and bar, each with three member fields, but does not save the structure definition.

struct widget foo, bar;
This form of the structure statement defines two instance, foo and bar, of the previously defined structure, widget.

In the extensions, the Value class is extended so that the value of any variable can be a struct definition or an instance of a struct. Such values may be assigned, so that it is possible to have nested structures. For example, here is an instance of a linked list:

  struct Link{next, data} list;  // define struct, instance
  list.next = 0;                 // set next pointer to 0
  struct Link link;              // create new link
  link.data = 17;                // set data field
  link.next = list.next;         // set next field
  list.next = link;              // patch into list


Overview of Modifications

The most obvious changes to support structures and objects are the changes to the syntax. The changes to the supporting code are somewhat more extensive.

The first and largest set of changes to supporting code is to the common definitions module. These changes include the definition of an Instance class, to encapsulate objects, that is, specific instances of structures, and additions to the Value class. These modifications are sufficient to provide support for the direct execution interpreters. They are also necessary for all the other interpreter architectures.

The second set of changes, to the bytecode interpreter module, added to the changes to the common definitions, suffices to support the direct compilation interpreters. These changes are also necessary for the astci interpreter. The changes to the opcode enumeration are also necessary for the astxi interpreter.

A small set of changes to the definitions of the AstNode class is also required for both the astxi and astci interpreters.

Finally, a small set of changes to the tree walk functions of the astxi and astci is required to add structures and objects to the abstract syntax tree interpreters.


Syntax Modifications

It should be noted that the modifications to the interpreter syntax files did not require changes to any of the previously existing rules, reduction procedures, or embedded C. All of the changes amount to adding new tokens, rules, and support functions.


Tokens Added to the Grammar

To add structure and object support to the syntax, it was necessary to add two new tokens to the grammar:
name list
A name list is a comma delimited list of names. The name list token is used both to specify a list of member field names in a structure and to specify the variables which are to be assigned instances of the structure. The name list token is identical to the name list token defined in the Function Definitions extension.

Note that the name list token could have been used to implement the dump statement.

(AgStack<AgString>) name list
  -> name:n            =AgStack<AgString>().push(n);
  -> name list:stack, ',',  name:n   =stack.push(n);
The same implementation of the name list is used in dxi, dci and ast parsers. To make sure that the AgStack object is properly constructed on the parser stack, a wrapper statement has been added to the configuration section.

It should be noted that the name list token is identical to the like named token defined in the function definitions extension.

structure declaration
The structure declaration is used both to define structures and to declare instances of a given structure. The structure syntax is the same as in C except that the member fields in the structures do not have type specifications. The content of a structure is given simply by a comma delimited list of field names. The member variable names are local to the structure, so that identically named variables in different structures are different variables.
structure declaration
  -> "struct", name,'{', >name list, '}', name list
  -> "struct", name, '{', name list, '}'
  -> "struct", '{', name list, '}', name list
  -> "struct", name, name list
The reduction procedures for these rules depend, of course, on the interpreter architecture. They do not, however, depend on the choice of language, CLL or PLL. Here is how they are implemented:
dxi
The local defineStructure() and defineInstances() functions are used to set the values of the specified structure and instance variables.
dci
The local codeStructDecl() and codeInstanceDecl() functions are used to create code fragments which will set the values of the specified structure and instance variables.
ast
Each of the four productions creates and returns a typeStructureDecl AstNode object.


New Grammar Rules

In order to integrate the structure syntax with the rest of the grammar, two additional rules have been added to the grammar. One defines a structure definition statement:
  simple statement
    -> structure declaration, ';'
The second rule provides access to member fields of structures, exactly as in C.
  lvalue
    -> lvalue, '.', name
Note that the definition is recursive, so that if a member field is itself an instance of a structure, its members are also accessible. The reduction procedure depends on the interpreter architecture, although not on the language itself. Here is how they are implemented:
dxi
The reduction procedure uses the getMember() function to return an lvalue which points to the appropriate member field.
dci
The reduction procedure creates a CodeFragment object consisting of a MEMBER opcode and a parameter which is the index of the member name in the fieldNames dictionary.
ast
The reduction procedure creates and returns a typeMember AstNode object.


New Configuration Statement

One configuration statement has been added:
  wrapper {AgStack<AgString>}
This provides proper C++ handling of the AgStack object on the parser value stack.


New Parser Control Block Extensions

The dci parser requires one new field in the parser control block. To simplify reduction procedures, several new methods have been added to the parser control block in the dxi and dci parsers. No new methods were required for the ast parser.

Three new methods have been added to the parser control block to implement structures and objects:

void defineStructure(const AgString &structName, const AgStack<AgString> &fieldNames);
This function defines a structure with the specified member fields and assigns the value to the variable structName. Any previous value of structName will be lost.


void defineInstances(const AgString &structName, const AgStack<AgString> &instanceNames);
This function defines instances of the specified structure. If any of the named variables has been previously used, its previous value is lost.


void defineInstances(const AgStack<AgString> &fieldNames, const AgStack<AgString> &instanceNames);
This function defines instances of the specified anonymous structure. If any of the named variables has been previously used, its previous value is lost.


One new data field and five new methods have been added to the parser control block to implement structures and objects:

AgDictionary<AgString> fieldNames;

CodeFragment codeMember(const AgString &);
This function generates code to identify the named member of an instance of a structure. The code will leave an lvalue on the runtime stack. It is assumed that code will have previously been generated to push an lvalue identifying an instance.

CodeFragment codeStructDecl(const AgString &name, const AgStack<AgString> &fields, const AgStack<AgString> &instanceNames);
This function generates code to create a structure definition with the given fields and store it in the variable given by name. Instances of this structure are then created and stored in the variables given by the instanceNames list.

CodeFragment codeStructDecl(const AgString &name, const AgStack<AgString> &fields);
This function generates code to create a structure definition with the given fields and store it in the variable given by name. No instances are created.

CodeFragment codeInstanceDecl(const AgStack<AgString> &name, const AgStack<AgString> &instanceNames);
This function generates code to create instances of the structure identified by name and store them in the variables listed by instanceNames.

CodeFragment codeInstanceDecl(const AgString &fields, const AgStack<AgString> &instanceNames);
This function generates code to create a structure definition with the given fields and then create instances of it which are stored in the variables given by the instanceNames list.


The Instance Class

Introduction

The Instance class serves to encapsulate an instance of a struct defined using the struct statement. It consists of a constant AgDictionary<AgString> object which names the member fields of an object and an AgArray<Value> object which contains the values that correspond to the names in the dictionary.

Member Fields

const AgDictionary<AgString> dictionary;
Contains the names of the member fields of the struct of which this object is an instance. Note that the dictionary is declared to be constant so that new entries cannot be made once the object has been constructed. Note also that if several instances of a single struct are created, all will share the same actual dictionary.

AgArray<Value> data;
This array holds the data corresponding to the member field names in dictionary. The values are initialized to uninitType.

Constructors/Destructor

Instance(const AgDictionary<AgString> &);
The primary constructor takes a reference to a dictionary which defines the member fields of the struct of which this object is an instance.

Instance(const Instance &);
The copy constructor exists simply for completeness.

~Instance();
There is nothing for the destructor to do. The dictionary and the data array both have destructors which will take care of freeing storage when it becomes necessary.

Member Functions and Operators

Value &getMember(const AgString &n);
Returns a reference to the value of specified field. The value can be either read or written. If there is no field with the given name, the function throws an ErrorMessage exception.

AgString asString() const;
Creates and returns a string representation of the object.

int hash(int startValue = 0);
This function calculates a hash code for the instance using the hash functions for the dictionary and the array. Creating a hash code allows the use of Instance objects in AgSet, AgMap and AgDictionary containers.

int operator == (const Instance &);
int operator != (const Instance &);
Two instances of a struct are equal if they have the same member variables, in the same order, and the variables have the same values. The equality operator supplements the hash function to allow use of Instance objects in AgSet, AgMap and AgDictionary containers.


Modifications to the Value Class

Introduction

The changes to the Value class required to support structures and objects are as follows:
  1. New types, structureType and instanceType have been added to the Type enumeration. Structures are represented by an AgDictionary<AgString> object, a dictionary of strings, which name the member fields of the structure. Instances are represented by Instance class objects.

  2. Since both the AgDictionary and the Instance class objects have constructors and thus require the use of the ValueWrapper class, entries have been added to the anonymous union to guarantee adequate space for the AgDictionary<AgString> object and the Instance object:
    char dictionarySpace
      [sizeof(ValueWrapper<AgDictionary<AgString> >)];
    char objectSpace[sizeof(ValueWrapper<Object>)];
    
    It is good practice to include such a sizing entry for each class that uses the ValueWrapper class, since not all classes can be counted upon to fit within a double.

  3. A constructor has been added to the Value class that will create a structureType Value object containing a given AgDictionary<AgString>.

  4. A constructor has been added to the Value class that will create an instanceType Value object containing a given instance of the Instance class.

  5. structureType and instanceType cases have been added to the switch statements in the following functions (see Added Cases):

  6. In addition to the AgDictionary and Instance constructors, the following new functions have been added to the Value class to support the above modifications (see Added Functions):

  7. String constants for error messages corresponding to new error modes have been added:
    • Defining a struct instance using a variable that is not a struct.
    • Accessing a member field of a variable which is not an object
    • Accessing an undefined member field.


Added Cases

Value &setValue(const Value &v);
In the setValue() function, used by constructors and the assignment operator, cases corresponding to the structureType and the instanceType have been added. Both use the placement new operator of the ValueWrapper class to store the appropriate object in the space provided by the anonymous union.

~Value();
In the destructor, the wrappers for the AgDictionary object and the Instance object must be deleted as necessary. Since both of these objects are reference counted, the actual data will only be deleted if this is the last reference to it.

int hash(int) const;
The structureType case passes along the AgDictionary hash value, the instanceType passes along the Instance hash value.

AgString asString() const;
The structureType and instanceTypecases are added to make appropriate AgString representations of the structure definition or instance values.

Value &operator = (const Value &v);
Cases have been added to delete the ValueWrapper containing the AgDictionary or Instance object.

Deleting the old value is exactly the same as in the destructor.

int operator == (const Value &v) const;
The structureType case determines that two structures are equal if and only if the dictionaries are equal, that is, the structures have like named fields in the same order.

The instanceType case is simply passed through to the Instance class equality operator.


Added Functions

Value(const AgDictionary<AgString> &);
This constructor creates a structureType Value object.

Value(const Instance &);
This constructor creates an instanceType Value object.

void assertStructure() const;
Throws an ErrorMessage exception if the Value object is not a structure.

void assertInstance() const;
Throws an ErrorMessage exception if the Value object is not an instance of a structure.

const AgDictionary<AgString> &getDictionary() const;
If the Value object is a structure, this function returns a reference to the AgDictionary object that defines it. Otherwise, the function throws an ErrorMessage exception.

Instance &getInstance() const;
If the Value object is an instance of a structure, this function returns a reference to the Instance object that defines it. Otherwise, the function throws an ErrorMessage exception.

Value &getMember(const AgString &n) const;
If the Value object is an instance of a structure, this function returns a reference to the value of the named member field, if it exists. Otherwise, the function throws an ErrorMessage exception.


Bytecode Interpreter Modifications

Extending the bytecode interpreter, whether for the CLL or the PLL language, to handle structures and objects requires the following changes:
  1. Two new opcodes, INSTANCE and MEMBER, have been added to the opcode enumeration.

  2. A constructor has been added to the Constant class, to correspond to the structureType constructor added to the Value class. Such structureType constants are used to initialize structure variables. No corresponding instanceType constructor is required, since there is no mechanism in the language to specify a constant instance of a structure.

  3. An array of strings, fieldNames, has been added to the ScriptMethod class, to provide for identifying member fields of structures. The parameter to the MEMBER opcode has to be an integer, preferably a small integer. During parsing, a string dictionary is used to identify field names used in structures. Since only the mapping from indices to names is needed at runtime, the contents of the parse time dictionary are used to initialize the array used by the ScriptMethod class.

    It would be possible to identify field names in the main dataset dictionary, but this would have the disadvantage of adding all member field names to the global name space. Note that the index in the fieldNames dictionary is not used to index storage. Instead, it is simply used by the MEMBER instruction to locate the name of the member field, which is then looked up in the dictionary that belongs to the structure.

  4. Ascii representations of the INSTANCE and MEMBER opcodes have been added to the opcodeText[] array.

  5. An initializer for fieldNames has been added to the ScriptMethod constructors.

  6. Addition of a MEMBER case to the ScriptMethod::list() function, in order to handle the field name correctly. No new case is required for INSTANCE because it does not need any special treatment.

  7. Addition of INSTANCE and MEMBER cases to the ScriptMethod::apply() function:
    INSTANCE
    The INSTANCE instruction takes a structure description from the top element of the stack and replaces the top of the stack with an Instance object, all of whose member fields are set to uninitType.

    MEMBER
    The operand of the MEMBER instruction is an index into the fieldNames array. The MEMBER instruction pops an instance of a struct from the top of the stack, retrieves the name of the field from the fieldNames array, extracts the named member field from the object and pushes the value of the member field back onto the stack. If the top of the stack is not an instance of a struct or the fieldNames index does not identify a field with in the struct, an ErrorMessage exception is thrown.


Abstract Syntax Tree Modifications

Extending the Abstract Syntax Tree definitions, whether for the CLL or the PLL language, to handle structures and objects requires the following changes:
  1. Add new node types to the NodeType enumeration to support the following types of nodes:
    typeStructureDecl
    This node type represents a structure declaration.
    typeMember
    This node type represents a member.
    typeName
    This node type is used to encapsulate a name for use in a typeMember node. It might appear at first glance that a typeVariable node might be sufficient, but to do so would add each such name to the global name space. It is actually implemented as a class, NameNode, derived from AstNode.
    typeNameList
    Identifies a node that contains a list of names. It is implemented as a class, NameListNode, derived from AstNode.

  2. Add structs, subnodesStructureDecl and subnodesMember, to the anonymous union of the AstNode class to allow access to the subnodes by name.

  3. Define specialized node classes, NameNode and NameListNode, derived from AstNode to encapsulate the typeName and typeNameList nodes. These classes are required since these are leaf nodes of the syntax tree.

  4. Add subnode counts for the newly defined node types to the nSubnodes[] array.

  5. Implement constructors for the newly defined node classes, NameNode and NameListNode.

  6. Define wrapper functions, nameNode() and nameListNode(), to simplify creation of typeName and typeNameList nodes.


Abstract Syntax Tree Interpreter Modifications

The changes to the two abstract syntax tree interpreters, astxi and astci, are essentially trivial.

The only changes to astxi are the addition of a typeStructDecl case to the switch statement in Interpreter::execute() and the addition of a typeMember case to the switch statement in Interpreter::evaluate(). Of course, this interpreter also depends on the changes made to the common definitions, the changes to the opcodes.h enumeration, and the changes to the abstract syntax tree parser.

The changes to astci are only slightly more complex:

  1. A string dictionary, fieldNames, has been added to the Compiler class. The dictionary is used to identify member fields of structures. It is passed on to the ScriptMethod class for use by the bytecode interpreter.

  2. Two new cases have been added to the switch statement in Compiler::compile():
    typeStructureDecl
    The one case is adequate to handle all four productions of the structure declaration. It begins by identifying the fieldNames and instanceNames lists. It then declares an empty CodeFragment object, structureDef, which will be set to contain code that fetches an appropriate structure definition to the top of the runtime stack.

    If fieldNames is not empty, a Constant is created which contains a dictionary that identifies the field names. Code is added to structureDef to fetch the value of this constant.

    If a structure name has been provided, then the appropriate behavior depends on whether there was a list of field names. If so, code is generated to save the structure definition in the appropriate variable. Otherwise, code is added to structureDef to fetch the structure definition from the named variable.

    Note that the syntax of the structure declaration is such that if the list of field names is empty, there must be a structure name.

    Finally, for each name in the instanceNames list, code is generated to fetch the structure definition, using the code in structDef, and an Instance object is created and stored in the appropriate variable.

    typeMember
    This case identifies the name of the member field in fieldNames and generates a MEMBER instruction.

  3. An initializer for fieldNames has been added to the ScriptMethod constructor. The value of fieldNames is then copied from the Compiler object.



Table of Contents | Parsifal Software Home Page


XIDEK
Extensible Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.