Parsifal Software


XIDEK
Extensible Interpreter Development Kit
Reference Documentation


PLL Syntax

The CLL language may well be viewed as too "complicated" for some applications. Indeed, many expert C programmers find it difficult to remember the details of the C expression syntax with its many levels of precedence. If the intended class of users of an interpreter is not familiar with C, the advantage of compatibility with C vanishes and it might be desirable to implement a more user-friendly syntax.

To demonstrate how to modify the language to make it "simpler", XIDEK contains an example of an alternate language, many of whose features are drawn from Pascal. It will be noted that while there are substantial changes to the syntax file, there is little in the support code that is specific to one language or another. Here are the differences between PLL and CLL:

Case Sensitivity

Case sensitivity is a feature that may not be desired in a language oriented toward non-computer specialists. In order to remove the case sensitivity in the language three changes were needed. First, case sensitivity is turned off in the parser by adding the following statement to the configuration segment of the syntax file:
case sensitive = OFF
Making this change has the immediate, unpleasant side effect of making the grammar ambiguous. The problem is the hex integer prefix:
(long) hex integer
 -> {"0x" | "0X"}
Because the lower case and upper case 'x' are no longer distinguishable, the two keywords "0x" and "0X" can no longer be distinguished. The problem is resolved by getting rid of one of the them:
(long) hex integer
 -> "0x"
Note that no change need be made to character set specifications. For example, 'a-f' + 'A-F' contains exactly the same characters as 'a-f'.

The last change necessary to eliminate case sensitivity involves the parsing of variable names:

(AgString) name
 -> letter:c                 =AgString().append(c);
 -> name:ns, letter+digit:c          =ns.append(c);
Even though upper and lower case letters are now syntactically indistinguishable, the values of the input characters are still different for upper and lower case. To make variable names case independent, then, requires that the characters that constitute them be converted to a canonical form. Thus, we add a function to convert them all to lower case:
(AgString) name
 -> letter:c           =AgString().append(tolower(c));
 -> name:ns, letter+digit:c    =ns.append(tolower(c));
Note that this also required a new #include file, ctype.h.


Expression Syntax

The standard C expression syntax has numerous features that might not be desirable in an interpreter to be used by people who are not professional programmers:
  • The number of levels in the operator precedence hierarchy. Few programmers can remember the details of the hierarchy so they normally use parentheses on subexpressions to be sure.

  • Assignment operators embedded in expressions. These provide side effects that can easily escape a reader's attention. They may provide a small degree of code optimization, but usually at the expense of clarity.

  • The comma operator, used almost exclusively in the initialization clause of for statements. Many experienced programmers are surprised and even confused when they find the comma operator used elsewhere.

  • The comparison operators, which may be chained in C, in a manner that can only be considered extremely misleading. For example, the expression 0 < x < 10 by mathematical convention is true for any value of x on the open interval 0 to 10. The C expression, however, is true for any positive value of x, since it is the result of the first comparison, one or zero, that is compared to 10. Experienced C programmers are, or should be, aware of this. Anybody else could easily fall victim to this extraordinarily misleading notation.
In the PLL syntax, thirteen levels of C expression hierarchy are compressed into three levels of the Pascal hierarchy:
  • At the lowest level of precedence are the relational operators, all at the same level of precedence. Chained comparisons are not allowed. Note that the equality operators are '=' and "<>" instead of the C operators "==" and "!=".
    expression
     -> simple expression
     -> simple expression, relational op, simple expression
    
    relational op
     -> '='
     -> "<>"
     -> '<'
     -> "<="
     -> '>'
     -> ">="
    

  • The next level of precedence consists of operators with the same precedence as addition: addition, subtraction, inclusive or and exclusive or. Note that the keywords "or" and "xor" replace the more recondite '|' and '^'.
    simple expression
     -> term
     -> simple expression, additive op, term
    
    additive op
     -> '+'
     -> '-'
     -> "or"
     -> "xor"
    

  • The next level of precedence consists of operators with the same precedence as multiplication: multiplication, division, remainder, logical and, and the shift operations. Again, the keywords "mod", "and", "shr" and "shl" are used in place of the C equivalents. In deference to Pascal, the pure integer divide operator, "div", has been introduced.
    term
     -> unary expression
     -> term, multiplicative op, unary expression
    
    multiplicative op
     -> '*'
     -> '/'
     -> "div"
     -> "mod"
     -> "and"
     -> "shl"
     -> "shr"
    

Subsequent precedence levels are the same as in CLL. Note, however that the unary operators are not the same:

unary op
 -> '-'
 -> "not"
Fortran style exponentiation has been retained. If it is not wanted, it can be removed by changing:
(Value) unary expression
 -> factor          // next higher precedence level
 -> '+', unary expression
 -> unary op, unary expression
to:
(Value) unary expression
 -> primary         // next higher precedence level
 -> '+', unary expression
 -> unary op, unary expression
and deleting the definition of factor.

The final change to the expression syntax is the deletion of the increment and decrement operators.

Because in this revised expression syntax, there is no longer an assignment expression, the definitions of print statement and arg list had to be modified to use expression rather than assignment expression.


Assignment Statement Syntax

In C there are a host of assignment operators. In addition to the simple assignment, there are compound operators corresponding to all of the binary operators excepting only the logical or and logical and. Furthermore, these operators can appear, parenthesized, as a primary anywhere in an expression.

In addition, there is effectively no assignment statement, as such, in the C syntax. Any expression, standing alone, is a statement, although except for function calls that return void, such an expression is usually an assignment expression.

Although these idiosyncrasies are often beloved of C programmers, they can be difficult to explain, and can yield quite inscrutable statements. In simpler languages such as Pascal, there is an explicit assignment statement.

Accordingly, the expression statement has been eliminated and replaced with an explicit assignment statement:

assignment statement
 -> lvalue, ":=", expression
Note that the assignment statement, as in Pascal, uses ":=" instead of '='. This usage has a double benefit. First, it eliminates a major source of beginner's error in the use of the equality operator, and second, it emphasizes that the assignment statement sets the value of a variable. The use of the simple equal sign, as in C, allows an uninstructed reader to imagine that the statement asserts equality, as the corresponding mathematical statement implies.


Compound Statement Syntax

In C, compound statements are delimited with curly braces:
compound statement
 -> '{', statement list, '}'
where statement list consists of zero or more statements. This been changed to:
compound statement
 -> "begin", statement list, "end"
Note that if you wish to use keywords in a language other than english, it is a trivial matter to substitute any other pair of words for begin and end.


If Statement Syntax

If statements have been changed from the form:
if (<expression>) <statement>
to:
if <expression> then <statement>
Note the absence of parentheses and the use of then.

The use of else is unchanged.

Remember that, since the keywords in this alternate language are not case sensitive, IF and ELSE can also be used.

It is important here to note an important distinction between C and Pascal notation. In C, semicolons are statement terminators. In Pascal, semicolons serve to connect the statements within a compound statement. Thus, in Pascal, the following code is incorrect:

if x > 5 then y = 2; else y = 3;
The error is the semicolon preceding the else. Although it would be possible to rearrange the statement syntax to make it consistent with Pascal, it seems pointless to do so, since the use of semi-colon as a terminator is much easier to explain than the Pascal usage.


While Statement Syntax

While statements have been changed from the form:
while (<expression>) <statement>
to:
while <expression> do <statement>
Note the absence of parentheses and the use of do.


Repeat Until Statement

The C do/while statement has been replaced with the Pascal repeat/until. Certainly there is no reason why a language shouldn't have both or neither. The primary difference between the two is the treatment of the loop exit condition. The do/while statement exits on false, the repeat/until statement exits on true.

The second difference is that no begin/end pair is needed to delimit statements within the repeat/until block. A direct translation of the corresponding C syntax would be as follows:

simple statement
 -> REPEAT, statement, UNTIL, expression, ';'
Instead, any number of statements are allowed:
simple statement
 -> REPEAT, statement list, UNTIL, expression, ';'
Note that use of begin and end will not, however, cause an error. It is interesting to note that the same simplification could have been applied to the do/while statement in C.

In some languages, this same technique is used for other block structures. One could just as well define if statements thus:

if statement
 -> "if", expression, "then", statement list, "endif"
 -> "if", expression, "then", statement list, "else",
      statement list, "endif"
One virtue of such a syntax is elimination of the dangling else conflict.

Similar syntax could be used for while and for loops, without any substantive effort on the part of the developer. For some classes of users, these syntactic forms could be preferable to the use of begin/end, since they make it easier to match the end of a block to the beginning of the block.


For Statement Syntax

The for statement represents the single largest change in the language, in terms of the ramifications it causes.

The actual syntax that has been chosen for implementation is not quite the same as Pascal. It provides for a choice of increment, but does not have an explicit downto alternative:

  for i:= 1 to 10 do x := x+i;
  for j:= 0 by 2 to 20 do begin
    ...
  end
  for k:= 10 by -1 to 1 do begin
    ...
  end
Notice the use of ":=" for the control variable assignment for consistency with the assignment statement usage.


Break and Continue Statements

These statements have been removed.



Table of Contents | Parsifal Software Home Page


XIDEK
Extensible Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.