Parsifal Software


XIDEK
Extensible Interpreter Development Kit
Reference documentation


Extending Interpreters:
Adding Arrays and Subscripts

Introduction

To illustrate the methods involved in extending interpreters, a set of interpreters has been created which adds arrays and subscripts to the features provided by the baseline interpreters. Source code for these extended interpreters can be found in cll\array, pll\array and support\array.

It is perhaps misleading to speak of changes to the interpreters, since adding support for arrays and subscripts required only that code be added to the interpreters. In no case was it necessary to modify existing code. New rules have been added to the syntax. The same rules were added to both the CLL and the PLL languages. New code to implement the new features has been added to the support code.

Although it may seem that arrays and subscripts should be part of the basic syntax supported by the kit, they have been left as an extension since there are a large number of ways to implement arrays and subscripts. It was felt that the baseline interpreters in the kit should be limited to those features which enjoy general agreement as to their implementation.

There are two basic ways that arrays can be implemented. One way is to simply set up an AgMap<Value, Value> object which would map any value to any other value. This approach would not require any sort of declaration and would allow statements of the form:

  x["foo"] = bar;
as well as conventional subscripts. Although this is an interesting approach, it is not what most people think of when they think of arrays and subscripts.

The approach actually chosen is to implement a simple array declaration:

  array foo[size];
where size can be an arbitrary integer expression. The expression is evaluated and an array of the specified size is assigned to the variable foo. The values in the array are initially set to uninitType. Elements of the array can now be set to arbitrary values.

To provide unity in the implementation, the Value class has been extended so that the value of any variable could be an array. Since each array is an array of Value class instances, it is possible to have array elements which are themselves arrays, thus providing the capability of multidimensional arrays. For example, here is a way to set up a three by three matrix.

array x[3];
for (i = 0; i<3; i++) {
  array temp[3];
  x[i] = temp;
}
for (i = 0; i<3; i++) {
  for (j = 0; j<3; j++) {
    x[i][j] = 1.0/(i+j+1);
  }
}

Since character strings are already arrays of characters, it seemed reasonable to extend subscripting to string valued variables as well. For example,

  x = "castle";
  y = x[2];
  x[2] = 't';
The value of y is 115, the ascii code for the letter 's'. Setting x[2] to 't' makes the value of x equal to "cattle".


Overview of Modifications

The most obvious changes to support arrays and subscripts are the changes to the syntax. The changes to the supporting code are somewhat more extensive.

The first and largest set of changes to supporting code is to the common definitions module. These changes include the definition of a CharPointer class, to represent pointers into character strings and additions to the Value class. These modifications are sufficient to provide array and subscript support for the direct execution interpreters. They are also necessary for all the other interpreter architectures.

The second set of changes, to the bytecode interpreter module, added to the changes to the common definitions, suffices to support the direct compilation interpreters. These changes are also necessary for the astci interpreter. The change to the opcode enumeration is also necessary for the astxi interpreter.

A small set of changes to the definitions of the AstNode class is also required for both the astxi and astci interpreters.

Finally, a small set of changes to the treewalk functions of the astxi and astci is required to add arrays and subscripts to the abstract syntax tree interpreters.


Syntax Modifications

To add array and subscript support to the syntax, it was necessary to add only two new rules to the grammar: an additional rule for the token simple statement:
simple statement
 -> "array", name, '[', expression, ']', ';'
and an additional rule for the token lvalue:
  lvalue
    -> lvalue, '[', expression:x, ']'
The first rule creates an array of specified size. The second implements subscripts. Note that since lvalue is recursively defined, it is possible to have multiple subscripts, as in C and C++.

The reduction procedures for these rules depend, of course, on the interpreter architecture. They do not, however, depend on the choice of language, CLL or PLL.


The CharPointer Class

Introduction

The CharPointer class has been added to represent lvalues which refer to character positions within a string. It enables scripts to write as well as read single characters within a string.

Member Fields

AgString &string;
A reference to the string that is the base of the subscript operation. Note that a reference can be used since the lvalues that give rise to a CharPointer value are transitory. They are created and deleted entirely within the scope of a single expression.

int index;
The index into the string specified by the subscript operator.

Constructors/Destructor

CharPointer(AgString &, int);
Creates a CharPointer object from a string and an index. If the index is out of bounds, an ErrorMessage exception is thrown.

CharPointer(const CharPointer &);
The copy constructor exists simply for completeness.

~CharPointer();
There is nothing for the destructor to do.

Member Function

char &value();
This function returns a reference to the indexed character within the string. The reference may be used either to read or to write the character value.


Modifications to the Value Class

Introduction

One facet of supporting arrays and subscripts is to modify the Value class so that a Value object may contain an AgArray<Value> object. Note that the AgArray template class uses value semantics.

Because the AgArray template class has a constructor, an AgArray object cannot be added to the anonymous union. Instead, the AgArray object has to be constructed by using the ValueWrapper class.

The changes to the Value class required to support arrays and subscripts are as follows:

  1. Add two new types, arrayType and charPointerType, to the Type enumeration.

  2. Add entries to the anonymous union to guarantee adequate space for the AgArray and CharPointer objects:
    char arraySpace[sizeof(ValueWrapper< AgArray<Value> >)];
    char charPointerSpace[sizeof(ValueWrapper<CharPointer>)];
    
    It is good practice to include such sizing entries for each class that uses the ValueWrapper class, since not all classes can be counted upon to fit within a double.

  3. Add constructors to the Value class that will create a Value object containing an array of a specified size or a CharPointer object

  4. Overload the subscript operator in the Value class so it can access an element of an array or a character in a string.

  5. Add arrayType and charPointerType cases to the switch statements in the following functions: Note that every switch statement that has a pointerType case also requires a corresponding charPointerType case.

  6. In addition to the new constructors and the subscript operator override, add the following new functions to the Value class to support the above modifications:

  7. Add string constants for error messages corresponding to new error modes:
    • Subscripting a variable whose value is not an array or string.
    • Subscript out of bounds.


Added Cases

Generally speaking, new cases for the arrayType and the charPointerType must be added to each switch statement in Value class functions. No charPointerType case is needed for those functions which are not applied to lvalues.

Two major classes of functions are the assignment operators and the increment/decrement operators.

Each assignment operator requires a new case to handle the charPointerType. These represent the situations where a subscripted string appears on the left side of an assignment operator. For example:

  x = "castle";
  x[2] = 't';
  x[1] |= 0x14;
The value of x is now "cuttle".

Each of the autoincrement and autodecrement operators also requires handling the charPointerType. These cases take care of situations like the following:

  x = "castle";
  y = x[2]++;
The value of x is now "cattle" and the value of y is 115, the ascii value of 's'.

In the following list of functions and operator overrides, only representative cases of the assignment and increment/decrement operators are provided.

Value &setValue(const Value &v);
In the setValue() function, used by constructors and the assignment operator, cases corresponding to the arrayType and the charPointerType have been added. Both use the placement new operator of the ValueWrapper class to store the appropriate object in the space provided by the anonymous union.

~Value();
In the destructor, the wrapper for the array object must be deleted. Since the array itself is reference counted, the actual array will only be deleted if this is the last reference to the array.

int hash(int) const;
The arrayType case calculates a hash value for the array, No hash value is required for the charPointerType case, since the lvalues cannot exist except in the course of evaluating an expression. Thus they cannot be entered into containers. Note that this would not be the case if the interpreter were to implement pointers.

const Value &deref();
The charPointerType case is added to fetch the character the pointer points to, turning the object into an integerType object.

AgString asString() const;
The arrayType case is added to make an appropriate AgString representation of the array.

No case is required for the charPointerType since lvalues are never output.

Value &operator = (const Value &v);
A case has been added for the charPointerType which dereferences the pointer and sets the value of the character pointed to, as well as the this object, to the value of the argument.

A case has also been added to delete the ValueWrapper containing the AgArray object.

Deleting the old value is exactly the same as in the destructor.

Value &operator += (const Value &v);
A case has been added for the charPointerType which dereferences the pointer and adds the value of the argument to the character pointed to. The value of this is set to the resulting sum and the type to integerType. All the remaining assignment operators are handled in a similar manner.

int operator == (const Value &v) const;
The arrayType case determines that two arrays are equal if and only if the arrays have the same size and all elements are equal.

Nothing is required for the charPointerType case, since comparisons are not done on lvalues.

int operator ++ (int);
The charPointerType case of the postincrement operator sets the value of the object to the unincremented value of the character pointed to, sets the type to integerType and increments the character pointed to.

The other autoincrement and autodecrement operators are handled in a similar fashion.


Added Functions

Value(const Value &, int);
When defining the constructor for an array object, one's first instinct is to define it to take a single integer argument. This, however, conflicts with the constructor for an integerType.

The next thing that comes to mind is to define the constructor to take a single Value object. This approach has the unpleasant property of conflicting with the copy constructor for the class.

The approach finally taken was to put a dummy argument in the calling sequence which is ignored. This is similar to the convention in C++ for distinguishing overrides for the pre-increment and post-increment operators. Thus the arguments to the constructor are a Value object, whose value determines the size of the array, and an integer whose value is utterly ignored.

Note that as implemented any scalar value, integerType or realType, may be used to specify the size of the array. If the scalar value is not integral, it is truncated, thus

array x[7.3];
creates an array of seven elements.

Value(const CharPointer &);
This constructor uses the placement new operator of the ValueWrapper to construct a CharPointer object in the space allocated by the anonymous union.

Value &operator [] (Value &);
Prior to actually doing the subscript operation, the function must first verify:
  • The value being subscripted is an array.
  • The subscript is a scalar, i.e. integerType or realType.
  • The subscript is not out of bounds.

void assertArray() const;
Throws an exception if the Value object is not an array.

AgArray<Value> &getArray() const;
getArray() returns a reference to the AgArray object contained in a Value object.

CharPointer<Value> &getCharPointer() const;
getCharPointer() returns a reference to the CharPointer object contained in a Value object. If the type of the object is not charPointerType an ErrorMessage exception is thrown.


Bytecode Interpreter Modifications

To extend the bytecode interpreter to handle arrays and subscripts requires only the addition and implementation of two opcodes:
ARRAY
The ARRAY instruction takes the size of the array from top element of the stack and replaces the top of the stack with a Value object initialized to contain an array of the specified size.

SUBSCRIPT
The SUBSCRIPT instruction pops the index from the top of the stack and subscripts what is now the top of the stack. The subscript operator in the Value class replaces the lvalue which identifies the array with the lvalue of the array element. This is true for both explicit arrays and for character strings.

Implementing these opcodes requires adding them to the opcode enumeration in opcodes.h, adding ascii representations for them to the opcodeText[] array in bcidefs.cpp, and, finally, adding two cases to the switch statement in the ScriptMethod::apply() function. No other changes to the bytecode interpreter definitions are required.


Abstract Syntax Tree Modifications

To extend the abstract syntax tree definitions to handle arrays and subscripts requires only addition of two new nodeTypes, typeArray and typeSubscript, to the node type enumeration. Corresponding structs, subnodesArray and subnodesSubscript, have been added to the anonymous union of the AstNode class to allow access to the subnodes by name. The final change was to add appropriate entries to the nSubnodes[] array. It should be noted that no changes at all were required to the AbstractSyntaxTree class.


Abstract Syntax Tree Interpreter Modifications

The changes to the two abstract syntax tree interpreters, astxi and astci, are essentially trivial.

The only changes to astxi are the addition of a typeArray case to the switch statement in Interpreter::execute() and the addition of a typeSubscript case to the switch statement in Interpreter::evaluate(). Of course, this interpreter also depends on the changes made to the common definitions, the changes to the opcodes.h enumeration, and the changes to the abstract syntax tree parser.

The only changes to astci are the addition of typeArray and typeSubscript cases to the switch statement in Compiler::compile().Of course, this interpreter also depends on the changes made to the common definitions, the changes to bytecode interpreter, and the changes to the abstract syntax tree parser.



Table of Contents | Parsifal Software Home Page


XIDEK
Extensible Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.