Parsifal Software

Home

Trial Copy

Intro. to Parsing

Users Say...

Special Features

Notation Summary

New 2.01 Features

File Trace

Grammar Trace

Glossary

Examples

Expression evaluator (freeware)

XIDEK interpreter kit (freeware)

Lex/Yacc Comparison

If-else ambiguity

Contact Parsifal

AnaGram


Resolving the General Dangling Else/If-Else Ambiguity


A troublesome feature of grammars for many programming languages, including C and C++, is a recurring problem known variously as dangling else, if-else ambiguity, or if-else conflict. We show a way to overcome this problem by syntax alone, yielding a conflict-free syntax without the need for "disambiguation rules". In addition, we provide an overview of the algebraic process used to develop the conflict-free syntax. In the following discussion, we use the notation of the AnaGram LALR parser generator to describe the syntax.


Description of the Ambiguity

Commonly, the syntax for if-else statements is written thus:
  if statement
    -> if clause, statement
    -> if clause, statement, "else", statement

  statement
    -> simple statement
    -> if statement
    -> loop statement
where simple statement subsumes expression statements, block statements, break and continue statements, and even the do/while statement; in short, any statement which is neither left nor right recursive. loop statement subsumes while statements and for statements, i.e., right recursive statements of the form:
  loop statement
    -> loop header, statement
This syntax is ambiguous, as can be illustrated by the following example:
  if (conditionA) if (conditionB) statementA; else statementB;
Using the syntax given above, this statement can be interpreted either as:
  if (conditionA) {
    if (conditionB) statementA; else statementB;
  }
or as:
  if (conditionA) {
    if (conditionB) statementA;
  }
  else statementB;
Both interpretations are consistent with the syntax definition given above, but they have very different outcomes. Conventionally, parsers are rigged using some sort of trick to select the first interpretation. This trick is often called a "disambiguation rule". In AnaGram, you could use the sticky attribute statement:
  [
    sticky {statement}
  ]
However, such tricks are not necessary, since the statement syntax can be made conflict-free as described below. A simpler conflict-free syntax for statements is provided as an example in section 4.3 of "Compilers: Principles, Techniques and Tools" by Aho, Sethi and Ullman, but their syntax is incomplete in that it does not, in fact, provide for right-recursive statements such as while and for.


An Unambiguous Syntax for Statements

The problem with the conventional syntax for statements is that it represents a simple taxonomy of statements rather than an appropriate description of syntax. Although it is true that there are if statements, loop statements and simple statements, this is not the most useful way to classify statements from a syntactic point of view.

The following syntax requires a following else to be paired with the most recent unpaired if, thus disallowing the if-else ambiguity.

We define an open statement as one which has at least one if that is not paired with a following else within the statement. A closed statement either does not contain an if at all, or if it does, the if is paired with a following else. We can then write the statement syntax as follows:

  statement
    -> open statement
    -> closed statement

  open statement
    -> if clause, statement
    -> if clause, closed statement, "else", open statement
    -> loop header, open statement

  closed statement
    -> simple statement
    -> if clause, closed statement, "else", closed statement
    -> loop header, closed statement
In this syntax, we allow only closed statements between an if and its matching else. In other words, between an if and an else an if is allowed only if it is paired with a matching else clause. The net effect is that each else is associated with the nearest preceding if.

In the next section we will show how this syntax can be developed by suitably rewriting our original syntax.


Developing the Conflict-Free Syntax

To see how to resolve the ambiguity and develop the above syntax, let us begin with the customary ambiguous syntax described in the first section and try rewriting statement as
  statement
    -> open statement
    -> closed statement
where, as described above, open statement is any statement that has a "dangling" if, that is, an if which could be paired with a following else. A closed statement is one which does not have a dangling if.

Substituting the above into the second rule for if statement yields:

  if statement
    -> if clause, statement
    -> if clause, closed statement, "else", statement
    -> if clause, open statement, "else", statement
Clearly, the last of these three rules for if statement is always ambiguous, since open statement, by definition, contains one or more ifs not paired with elses, and thus it is not clear which if should be associated with the else.

Therefore, let us eliminate the last rule, leaving our rules for if statement as follows:

  if statement
    -> if clause, statement
    -> if clause, closed statement, "else", statement
One could ask whether it is legitimate to discard a rule so cavalierly. In fact it can be shown, by means of algebraic manipulations too tedious to include here, that all sentences produced by the discarded rule can also be produced by the remaining rules. Thus the discarded rule adds nothing to the grammar but ambiguity.

Now, it remains to determine just which rules for statement belong to open statement and which to closed statement.

To this end, we expand the original rules for statement using the two rules above:

  statement
    -> simple statement
    -> if clause, statement
    -> if clause, closed statement, "else", statement
    -> loop header, statement
The first rule is clearly closed. The second rule is clearly open since it contains at least one unpaired if. The last two rules cannot be classified as they stand, since they would be open or closed depending on whether the final statement were open or closed. Therefore, we expand the final token of the last two rules to yield:
  statement
    -> simple statement
    -> if clause, statement
    -> if clause, closed statement, "else", open statement
    -> if clause, closed statement, "else", closed statement
    -> loop header, open statement
    -> loop header, closed statement
Now, we can reorder the above rules:
  statement
    -> if clause, statement
    -> if clause, closed statement, "else", open statement
    -> loop header, open statement

    -> simple statement
    -> if clause, closed statement, "else", closed statement
    -> loop header, closed statement
By our definitions, the first three rules are open statements and the last three rules are closed statements. So we can finally rewrite the syntax:
  statement
    -> open statement
    -> closed statement

  open statement
    -> if clause, statement
    -> if clause, closed statement, "else", open statement
    -> loop header, open statement

  closed statement
    -> simple statement
    -> if clause, closed statement, "else", closed statement
    -> loop header, closed statement
yielding the conflict-free syntax presented in the previous section.



AnaGram parser generator
Copyright ©1993-2002, Parsifal Software.
All Rights Reserved.


Parsifal Software
Links to: Home page | Trial Copy | Syntax Directed Parsing | Glossary