Parsing

We saw that top-down parsers may need to backtrack when they select the wrong production. We want to avoid backtracking.

This is where predictive parsers come in useful

LL(1): left to right scan, left-most derivation, 1-token look ahead
LR(1): left to right scan, right-most derivation, 1-token look ahead

LL(1) Grammar

In order to use LL(1) parsers, the context-free grammar has to be:

unambiguous
without left recursion
left factored

Eliminating Left Recursion

Consider the grammar fragment:

<foo> ::= <foo>A
        | B

where A and B do not start with <foo>. We can re-write this as:

<foo> ::= B<bar>
<bar> ::= A<bar>
        | epsilon

where <bar> is a new non-terminal. This fragment contains no left recursion.

Left Factoring

For any two productions, we would like a distinct way of choosing the correct production to expand.

We define FIRST (A) as the set of terminals that appear first in some string derived from A.

For a terminal w, we can say: $w \in FIRST(A) \iff A \Rightarrow wz$

Now going back to our two productions, we do FIRST(production 1) $\cap$ FIRST(production 2) $=\phi$ . This would allow the parser to make a correct choice with a look ahead of only one symbol.

FIRST and FOLLOW sets

FIRST Set Calculation

Rules to calculate the FIRST set:

FIRST(terminal) is (terminal)
If $A ::= a$ , and a is a terminal:

${a} \in FIRST (A)$

If $A ::= B$ and rule B does not exist in grammar:

$FIRST(B) \in FIRST(A)$

If $A ::= B$ and rule B does exist in the grammar:

${(FIRST(B) - \epsilon) \cup FIRST(a)} \in FIRST (A)$

FOLLOW Set Calculation

The follow set of non-terminal $A$ contains all the terminals that appear after $A$ in any string generate by the grammar $G$ .

Rules to calculate the FOLLOW set:

{$} $\in$ FOLLOW(S)
If $A ::= aB$

FOLLOW(A) $\in$ FOLLOW (B)

If $A ::= aBy$ and $y$ does not exist in the grammar

FIRST(y) $\in$ FOLLOW(B)

If $A ::= aBy$ and $y$ does exist in the grammar:

${(FIRST(y)-\epsilon)} \cup FOLLOW(A)} \in FOLLOW(B)$

Parsing tables

LL(1) Parsing

In order to implement an LL(1) parser, we need to use the following data structures:

parse table (2D array)
stack (contains the derivations)
list (that will contain the token input stream)

LL(1) Error Recovery

What happens when the parser discovers an error?

Approach 1: stop all parsing activity and return an error message
Approach 2: try to continue parsing (is possible) and see if there are more errors along the way

Which approach does your compiler take?

An error is detected when:

the terminal on top of the stack does not match the next input token
the parsing table cell from which we are supposed to pull the next production is empty

What does the parser do?

It enters the panic-mode error Recovery
Based on the idea of skipping symbols on the input until a token in the SYNCH set appears

Let $S$ be a set of tokens called a synchronization set (SYNCH). Let $s \in S, s$ is called a synchronization token.

Place all symbols in FOLLOW(A) into the SYNCH(A) set for non-terminal A. If we skip tokens until an element of SYNCH(A) is seen and we pop A from the stack, it's likely that parsing can continue.

The panic-mode error recovery can be implemented using the SYNCH set(s) as follows:

Scenario 1: If there is a non-terminal at the top of the stack, discard input tokens until you find a synch token, then pop the non-terminal
Scenario 2: If there is a terminal at the top of the stack, we could try popping it to see whether we can continue. Assume that the input string is actually missing that terminal

A NON LL(1) Grammar

Consider the grammar:

<stmt> ::= if <expr> then <stmt>
        | if <expr> then <stmt> else <stmt>

Needs left factoring, which gives:

<stmt’> ::= else <stmt> | ε

Let’s get the FIRST and FOLLOW sets

FIRST(stmt) = {if}
FIRST(stmt’)={else, ε}
FOLLOW(stmt) = {$, else}
FOLLOW(stmt’)={$, else}

The problem arises because for an input token else and stack top of stmt’, we do not know which production to choose:

<stmt’> else <stmt>
<stmt’> ε

Therefore, this is not an LL(1) grammar

Software Process Models

Concurrency