Parsifal Software


XIDEK
Extensible Interpreter Development Kit
Reference documentation


Extending Interpreters:
Adding Function Definitions

Introduction

As one illustration of the methods involved in extending interpreters, a set of interpreters has been created which adds function definitions and calls to the features provided by the baseline interpreters. It also adds a local statement which provides for variables whose scope is effectively limited to a single script or function. Source code for these extended interpreters can be found in cll\function, pll\function and support\function.

Although adding support for arrays and structures required no changes to existing code, the same is not true for function definitions. The first reason is that the baseline interpreters already had support for calls to library functions. In this extension, the function call logic must be modified to distinguish between library functions and locally defined functions. Also, the implementation of the local statement required modifying the rule defining the "start" token of the grammar.

Otherwise, adding support for function definitions required only that code be added to the interpreters. New code to implement the new features has been added to the support code. New rules have been added to the syntax. Almost identical rules were added to both the CLL and the PLL languages. The only difference is the use of begin/end to mark the body of a function in the PLL syntax and braces in the CLL syntax.

There are, of course, many ways to implement function definitions. The method chosen here was to use a syntax similar to that used for arrays and structs:

  function <function name> ([<parameter name>]...) {
    < function body>
  }
in the CLL syntax, or, in the PLL syntax:
  function <function name> ([<parameter name>]...)
  begin
    < function body>
  end
(Note that line endings are simply white space and may be used or omitted freely.)

The function definition is encapsulated in a LocalFunction object which is stored in the variable named by <function name>. Since functions can thus be values of variables, they may be assigned to other variables and passed as arguments to functions.

The Euclidean algorithm for greatest common divisor can be written thus in the CLL notation:

  function gcd(n, m) {
  local remainder;
  if (n < 0) return gcd(-n, m);
  if (m < 0) return gcd(n, -m);
  if (n < m) return gcd(m,n);
  remainder = n%m;
  if (remainder != 0) return gcd(m, remainder);
  return m;
}
or, in the PLL notation:
function gcd(n, m) begin
  local remainder;
  if n < 0 then return gcd(-n, m);
  if m < 0 then return gcd(n, -m);
  if n < m then return gcd(m,n);
  remainder := n mod m;
  if remainder <> 0 then return gcd(m, remainder);
  return m;
end
When the interpreter encounters a function definition, it simply scans over the text of the function and saves it as a string in a LocalFunction object. When the function is called, the string containing the text of the function is extracted and the interpret() function is called recursively.

The local statement consists of a list of variable names. If it is used, it must be either the first statement of the script or the first statement of a function definition.


Overview of Modifications

The modifications to support local function definitions are substantially more extensive than those required to support arrays and subscripts, on the one hand, or structures and objects, on the other.

The most obvious changes are the changes to the syntax. Not only added syntax was required, but a small modification to existing syntax was required.

The changes to the supporting code are somewhat more extensive. The first and largest set of changes to supporting code is to the common definitions module. These changes include the definition of an LocalFunction struct, to encapsulate function definitions. These modifications are sufficient to provide support for the direct execution interpreters. They are also necessary for all the other interpreter architectures.

The second set of changes, to the bytecode interpreter module, added to the changes to the common definitions, suffices to support the direct compilation interpreters. These changes are also necessary for the astci interpreter.

A set of changes to the definitions of the AstNode class is also required for both the astxi and astci interpreters.

Finally, a set of changes to the treewalk functions of the astxi and astci is required to add structures and objects to the abstract syntax tree interpreters.


Syntax Modifications

Tokens Added to the Grammar

To add function definition support to the syntax, it was necessary to add three new tokens to the grammar:
name list
A name list is a comma delimited list of names. The name list token is used both to specify the parameter names and the local variables in a function definition. The name list token is identical to the name list token defined in the Structures and Objects extension.

Note that the name list token could be used to implement the dump statement.

(AgStack<AgString>) name list
  -> name:n            =AgStack<AgString>().push(n);
  -> name list:stack, ',',  name:n =stack.push(n);
The same implementation of the name list is used in dxi, dci and ast parsers. To make sure that the AgStack object is properly constructed on the parser stack, a wrapper statement has been added to the configuration section.

local statement
A local statement has been added to the language in order to provide the capability of using local variables inside a function without disturbing the value of like named variables used outside the function.

As with all extensions, there are a number of ways that the local statement could have been implemented. In many ways, the most convenient method of implementing scopes is to use a push down stack of symbol tables. On entering a scope, the current symbol table is stacked and a new symbol table is created. On exiting the scope, the previously active symbol table is restored. In order to handle variable access correctly, symbol table lookups must scan the current symbol table and all stacked tables until the symbol is found or there are no more symbol tables.

Such an implementation, however, must usually be designed in from the beginning. In the present circumstance, it would require retrofitting all symbol table lookups in the interpreters. Accordingly, another approach was taken.

The approach taken to implementation of the local statement is to use a single symbol table together with a stack. On entry to a function, the values of the local symbols are stacked and then set to uninitType. On exit from the function, the values are restored from the stack.

The local statement, if used, must be the first statement of a script or function. It has the following syntax, in both the CLL and PLL languages:

  local statement
   ->
   -> "local", name list, ';'
The reduction procedures for these rules depend, of course, on the interpreter architecture. They do not, however, depend on the choice of language, CLL or PLL. Here is how they are implemented:
dxi
The local localVariables() function is called to save the values of the listed variables on a stack. The original values will be restored on exit from the interpret() function.

dci
The list of local variable names is saved and passed on to the ScriptMethod constructor, where it is used by the apply() function to save values on entry and restore them on exit.

ast
A NameListNode is created and returned.

function body
Using the context variable and the input pointer, the function body production identifies the beginning and end of the function text and copies it to an AgString object:
(AgString) function body
 -> local statement, statement list ={
      return AgString().append(
      (const char *) CONTEXT.pointer,
      PCB.pointer - CONTEXT.pointer);
 }
The same reduction procedure is used for all interpreters.

function definition
The function definition differs from the C function definition in the use of the key word "function" and the lack of any return type. The return type of any function is automatically Value. If a function does not specify a return value, a value of uninitType is returned. The CLL syntax is:
function definition
 -> "function", name,'(', name list, ')',
      '{', function body, '}'
The PLL syntax is:
function definition
 -> "function", name,'(', name list, ')',
      "begin", function body, "end"
The reduction procedures for these rules depend, of course, on the interpreter architecture. They do not, however, depend on the choice of language, CLL or PLL. Here is how they are implemented:
dxi
The reduction procedure calls the local, dxi version of the defineFunction() function to create a LocalFunction object and save it as the value of the variable given as the function name.
dci
The reduction procedure calls the local, dci version of the defineFunction() function to generate code to define the function and save it as the value of the variable given as the function name.
ast
The reduction procedure creates and returns a typeFunction node.


New and Changed Grammar Rules

In order to integrate the function definition syntax with the rest of the grammar, one rule has been added to the grammar:
simple statement
 -> function definition
This rule simply ties the function definition into the grammar. There is no reduction procedure for this rule in any of the parsers.

One rule has been changed, to provide for local variables in a script or function:

script $
 -> local statement, statement list, eof
Because of this rule, the local statement, if it exists at all, must be the first rule in the script. Note that the function body token imposes the same condition.

The "local" statement works by saving the values, if any, of the listed variables, and then restoring them on exit from the function. Thus, if two temporary variables are needed, one can specify

  local foo, bar;
to define them.

The parameters of the function are also treated as local variables, just as is the norm in C. Note that any variable may be used within a function, not just local variables, so that function calls can have side effects.

Because the direct execution interpreters require special syntax selection to handle loops, the changed rule for the dxi case is somewhat different:

  (Value) syntax selection $
   -> SCRIPT, local statement, statement list, eof  =Value();
Note that although the reduction procedure is unchanged in dxi, the dci and ast parsers need slight modifications in their reduction procedures in order to store away the list of local variables.


New Configuration Statement

One configuration statement has been added to all the parsers in order to support the name list token:
  wrapper {AgStack<AgString>}
This provides proper C++ handling of the AgStack object on the parser value stack.


New Parser Control Block Extensions

New data fields and new methods have been added to the parser control block as follows:
dxi
AgStack<AgString> localList;
This AgStack object is used to record the names of the local variables that must be saved on entry and restored on exit from a script or function.

AgStack<Value> saveStack;
This stack provides storage to save the values of the variables on the localList.

Value callFunction(const AgString &name, const AgStack<Value> &);
This function, local to the parser control block, checks for a variable with the given name. If the value of this variable is a local function definition and the number of arguments is correct, the local function is called. Otherwise, the global version of callFunction() is invoked to try calling an external function with the given name.

void defineFunction(const AgString &variableName, const AgStack<AgString> &parameterNames, const AgString &text);
This function is used to create a LocalFunction object with the specified parameter names and body text. The object is then saved as the value of the named variable.

void localVariables(const AgStack<AgString> &list);
This function protects the current values of the listed variables, by pushing their values onto the saveStack.

dci
AgDictionary<AgString> functionNames;
This AgDictionary is used to translate the names of functions that are called into a compact set of integers for use by the CALL instruction. As the parser identifies function calls, it records the name of each function called in this dictionary. The contents of the dictionary are eventually handed off to the ScriptMethod constructor for use by the bytecode interpreter.

AgStack<AgString> localList;
This AgStack object is used to record the names of the local variables that must be saved on entry and restored on exit from a script or function.

CodeFragment defineFunction( const AgString &n, const AgStack<AgString> &nameList, const AgString &text);
This function generates a Constant object which, in turn, contains a LocalFuntion object which describes the function. Code is then generated to fetch the constant and store its value in the named variable. Note that the CALL opcode has been modified to take the number of arguments to the function on the stack.

ast
No new extensions to the parser control block are required.


The LocalFunction Struct

Introduction

The LocalFunction struct serves to encapsulate the definition of a function that has been defined using the function statement. It identifies the names of the parameters and the text which comprises the body of the function. Note that the data members of the struct are defined to be constant, since once the function is defined, there is no reason to change these values. By defining them as const, unnecessary copying of data is reduced.


Member Fields

const AgArray<AgString> parameterList;
Contains a list specifying the names of the parameters used in the body of the function.

const AgString text;
This field contains the body of the function as an ascii string. An implementation of the interpreter that used a bytecode interpreter might well replace this field with an array of bytecode.

Constructors/Destructor

LocalFunction(const AgStack<AgString> &p, const AgString &t);
The primary constructor takes references to a stack of parameter names and a string of body text. The constructor converts the stack of parameter names to an array of parameter names.

LocalFunction(const LocalFunction &);
The copy constructor exists simply for completeness.

~LocalFunction();
There is nothing for the destructor to do. The AgArray and AgString objects both have destructors which will take care of freeing storage when it becomes necessary.

Member Functions and Operators

AgString asString() const;
Returns a string representation of the function. This implementation simply creates a string of the form "function(a,b,..)" where a, b, ... are the parameters of the function.

int hash(int startValue = 0);
This function calculates a hash code for the instance using the hash functions for the AgArray and AgString members. Creating a hash code allows the use of LocalFunction objects in AgSet, AgMap and AgDictionary containers.

int operator == (const Instance &);
Two instances of a LocalFunction object are equal if they have the same parameters, in the same order, and the function bodies are identical. The equality operator supplements the hash function to allow use of Instance objects in AgSet, AgMap and AgDictionary containers.


Modifications to the Value Class

Introduction

The changes to the Value class required to support structures and objects are as follows:
  1. Add a new type, functionType to the Type enumeration.

  2. Since the LocalFunction struct has a constructor, and thus requires the use of the ValueWrapper class, add an entry to the anonymous union to guarantee adequate space for it.
    char functionSpace[sizeof(ValueWrapper<LocalFunction>)];
    
    It is good practice to include such a sizing entry for each class that uses the ValueWrapper class, since not all classes can be counted upon to fit within a double.

  3. Add a constructor to the Value class that will create a functionType Value object containing a given LocalFunction object.

  4. Add functionType cases to the switch statements in the following functions (see Added Cases):

  5. In addition to the LocalFunction constructor, add the following new functions to the Value class to support the above modifications (see Added Functions):

  6. Add string constants for error messages corresponding to new error modes:
    • Trying to call something that is not a function.


Added Cases

Value &setValue(const Value &v);
In the setValue() function, used by constructors and the assignment operator, a case corresponding to the functionType has been added. It uses the placement new operator of the ValueWrapper class to store the LocalFunction instance in the space provided by the anonymous union.

~Value();
In the destructor, the wrapper for the LocalFunction object must be deleted. Since the fields in the LocalFunction struct are reference counted, the actual data will only be deleted if this is the last reference to it.

int hash(int) const;
The functionType case computes a hash code based on both the parameter list and the body text of the function.

AgString asString() const;
The functionType passes the call through to the asString() implementation.

Value &operator = (const Value &v);
A case has been added to delete the ValueWrapper containing the old value of the LocalFunction object.

Deleting the old value is exactly the same as in the destructor.

int operator == (const Value &v) const;
The function case passes the operation back to the equality operator defined for the LocalFunction class.


Added Functions

Value(const LocalFunction &);
This constructor creates a functionType Value object.

void assertFunction() const;
Throws an ErrorMessage exception if the Value object is not functionType.

LocalFunction &getFunction() const;
If the Value object contains a function definition, this function returns a reference to it. Otherwise, the function throws an ErrorMessage exception.

int isFunction() const;
Returns non-zero if the Value object contains a function definition. Otherwise, the function returns zero.


Bytecode Interpreter Modifications

Extending the bytecode interpreter, whether for the CLL or the PLL language, to handle structures and objects requires the following changes:
  1. Define localList as an array of strings. This contains the names of the local variables the interpreter should save on entry and restore on exit.

  2. Define functionNames as an array of strings. This list contains the names of variables that are used as function names in function calls. At compile time, a dictionary is used to identify the names of functions that are called, so that the name of the function can be identified by a small integer. When the ScriptMethod object is initialized, the contents of the dictionary are copied to this array so that at run time the name of the function to call can be retrieved. Note that if the dataset dictionary were used for this functionality, and a script called a library function, sqrt() for instance, the name of the library function would be incorrectly added to the dataset.

  3. An initializer for functionNames has been added to the ScriptMethod constructors.

  4. The CALL case in ScriptMethod::list() has been changed to take the name of the function from the functionNames array rather than from the function table. It also takes the number of arguments from the top of the stack. Note that in the baseline interpreters, the function is actually identified at compile time. In the this extension, that cannot be done, since it may be the user's desire to override a library function definition. Thus the number of arguments is simply noted at compile time, and furnished as an argument on the stack at run time.

  5. Define a local stack variable, saveStack, in ScriptMethod::apply(). On entry to the function, scan over the variables listed in localList and push their values onto saveStack. On exit from the function, scan localList in reverse order and restore the named variables.

  6. Modify the CALL case in the ScriptMethod::apply() function so that:
    1. it fetches the function name from the functionNames array,
    2. finds the value of the named variable
    3. If the named variable identifies a function and the number of operands is correct, the named function is called. Otherwise, an attempt is made to call a library function with the given name.
    The function call process begins by saving the variables named in the parameter list. It then makes a recursive call to the interpret() function. On return, the parameter variables are restored to the values they had before the function was executed.


Abstract Syntax Tree Modifications

Extending the Abstract Syntax Tree definitions, whether for the CLL or the PLL language, to handle structures and objects requires the following changes:
  1. Add new node types to the NodeType enumeration to support the following types of nodes:
    typeFunction
    This node type represents a function declaration.
    typeNameList
    Identifies a node that contains a list of names. It is implemented as a class, NameListNode, derived from AstNode.

  2. Add a struct, subnodesFunction to the anonymous union of the AstNode class to allow access to the subnodes by name.

  3. Define a specialized node class, NameListNode, derived from AstNode to encapsulate the typeNameList node. This class is required since this is a leaf node of the syntax tree.

  4. Add subnode counts for the newly defined node types, typeFunction and typeNameList to the nSubnodes[] array.

  5. Increment the subnode count for typeScript node types in the nSubnodes[] array to reflect the fact that an additional subnode has been added to this node.

  6. Implement constructors for the newly defined node class, NameListNode.

  7. Define a wrapper function, nameListNode(), to simplify the creation of typeNameList nodes.


Abstract Syntax Tree Interpreter Modifications

The changes to the astxi interprereter required to support local function definitions are:
  1. In the Interpreter::execute() function, modify the typeScript case to fetch a reference to the list of local variables and protect them on a local stack before making the recursive call to the interpret() function. On return, restore the variables from the stack.

  2. In the Interpreter::execute() function, add a typeFunction case to handle function definitions. In this case, a variable name, parameter list, and function text are extracted from the subnodes. A LocalFunction object is created from the parameter list and the function text, and then assigned to the named variable.

  3. In the Interpreter::evaluate() function, modify the typeFunctionCall case to check first for a locally defined function. If one is not found, try calling the like named library function.

    If a local function is found, save the variables named in the parameter list on a local stack. Then make a recursive call to the interpret() function. On return, restore the parameter variables to the values they had before the function was executed.

The changes to the astci interpreter required to support local function definitions are:

  1. Add a functionNames dictionary to the Compiler class:
      AgDictionary<AgString> functionNames;
    
    This dictionary provides a means of identifying function names without inadvertently adding the names of library functions to the global symbol table. The dictionary is created by the compile() function as it walks the abstract syntax tree.

  2. Add a localList stack to the Compiler class:
      AgStack<AgString> localList;
    
    The list is created by the compile() function as it walks the abstract syntax tree. It is then passed to the ScriptMethod constructor, where it is used by the ScriptMethod::apply() function to save and restore the values of local variables.

  3. Add a typeScript case to the Compiler::compile() function. Note that in the baseline interpreter, the typeScript node contains a single subnode, typeStatementBlock, which is handled by the default case in the switch statement.

    In this case, the list of local variables is extracted from the typeNameList subnode and assigned to localList. Then the compile() function is called recursively to compile the body of the script.

  4. Add a typeFunction case to the Compiler::compile() function. This case generates a Constant object which, in turn, contains a LocalFuntion object which describes the function. Code is then generated to fetch the constant and store its value in the named variable. Note that the CALL opcode has been modified to take the number of arguments to the function on the stack.

  5. Add initializers for functionNames and localList to the constructor for ScriptMethod. After compiling the bytecode, copy the values of these lists from the Compiler object.



Table of Contents | Parsifal Software Home Page


XIDEK
Extensible Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.