Suppose we are parsing a language which has if-then and if-then-else statements, with a pair of rules like this:
if_stmt:
IF expr THEN stmt
| IF expr THEN stmt ELSE stmt
;
Here we assume that IF, THEN and ELSE are terminal symbols for specific keyword tokens.
When the ELSE token is read and becomes the look-ahead token, the contents of the stack (assuming the input is valid) are just right for reduction by the first rule. But it is also legitimate to shift the ELSE, because that would lead to eventual reduction by the second rule.
This situation, where either a shift or a reduction would be valid, is called a shift/reduce conflict. Ocamlyacc is designed to resolve these conflicts by choosing to shift, unless otherwise directed by operator precedence declarations. To see the reason for this, let’s contrast it with the other alternative.
Since the parser prefers to shift the ELSE, the result is to attach the else-clause to the innermost if-statement, making these two inputs equivalent:
if x then if y then win (); else lose;
if x then do; if y then win (); else lose; end;
But if the parser chose to reduce when possible rather than shift, the result would be to attach the else-clause to the outermost if-statement, making these two inputs equivalent:
if x then if y then win (); else lose;
if x then do; if y then win (); end; else lose;
The conflict exists because the grammar as written is ambiguous: either parsing of the simple nested if-statement is legitimate. The established convention is that these ambiguities are resolved by attaching the else-clause to the innermost if-statement; this is what Ocamlyacc accomplishes by choosing to shift rather than reduce. (It would ideally be cleaner to write an unambiguous grammar, but that is very hard to do in this case.) This particular ambiguity was first encountered in the specifications of Algol 60 and is called the “dangling else” ambiguity.
The definition of if_stmt above is solely to blame for the conflict, but the conflict does not actually appear without additional rules. Here is a complete Ocamlyacc input file that actually manifests the conflict:
%token IF THEN ELSE variable
%%
stmt: expr
| if_stmt
;
if_stmt:
IF expr THEN stmt
| IF expr THEN stmt ELSE stmt
;
expr: variable
;