Parsing
We saw that top-down parsers may need to backtrack when they select the wrong production. We want to avoid backtracking.
This is where predictive parsers come in useful
- LL(1): left to right scan, left-most derivation, 1-token look ahead
- LR(1): left to right scan, right-most derivation, 1-token look ahead
LL(1) Grammar
In order to use LL(1) parsers, the context-free grammar has to be:
- unambiguous
- without left recursion
- left factored
Eliminating Left Recursion
Consider the grammar fragment:
<foo> ::= <foo>A| B
where A and B do not start with <foo>
. We can re-write this as:
<foo> ::= B<bar><bar> ::= A<bar>| epsilon
where <bar>
is a new non-terminal. This fragment contains no left recursion.
Left Factoring
For any two productions, we would like a distinct way of choosing the correct production to expand.
We define FIRST (A) as the set of terminals that appear first in some string derived from A.
For a terminal w, we can say:
Now going back to our two productions, we do FIRST(production 1) FIRST(production 2) . This would allow the parser to make a correct choice with a look ahead of only one symbol.
FIRST and FOLLOW sets
FIRST Set Calculation
Rules to calculate the FIRST set:
- FIRST(terminal) is (terminal)
- If , and a is a terminal:
- If and rule B does not exist in grammar:
- If and rule B does exist in the grammar:
FOLLOW Set Calculation
The follow set of non-terminal contains all the terminals that appear after in any string generate by the grammar .
Rules to calculate the FOLLOW set:
- {$} FOLLOW(S)
- If
- FOLLOW(A) FOLLOW (B)
- If and does not exist in the grammar
- FIRST(y) FOLLOW(B)
- If and does exist in the grammar:
- {(FIRST(y)-\epsilon)} \cup FOLLOW(A)} \in FOLLOW(B)
Parsing tables
LL(1) Parsing
In order to implement an LL(1) parser, we need to use the following data structures:
- parse table (2D array)
- stack (contains the derivations)
- list (that will contain the token input stream)
LL(1) Error Recovery
What happens when the parser discovers an error?
- Approach 1: stop all parsing activity and return an error message
- Approach 2: try to continue parsing (is possible) and see if there are more errors along the way
Which approach does your compiler take?
An error is detected when:
- the terminal on top of the stack does not match the next input token
- the parsing table cell from which we are supposed to pull the next production is empty
What does the parser do?
- It enters the panic-mode error Recovery
- Based on the idea of skipping symbols on the input until a token in the SYNCH set appears
Let be a set of tokens called a synchronization set (SYNCH). Let is called a synchronization token.
Place all symbols in FOLLOW(A) into the SYNCH(A) set for non-terminal A. If we skip tokens until an element of SYNCH(A) is seen and we pop A from the stack, it's likely that parsing can continue.
The panic-mode error recovery can be implemented using the SYNCH set(s) as follows:
- Scenario 1: If there is a non-terminal at the top of the stack, discard input tokens until you find a synch token, then pop the non-terminal
- Scenario 2: If there is a terminal at the top of the stack, we could try popping it to see whether we can continue. Assume that the input string is actually missing that terminal
A NON LL(1) Grammar
Consider the grammar:
<stmt> ::= if <expr> then <stmt>| if <expr> then <stmt> else <stmt>
Needs left factoring, which gives:
<stmt’> ::= else <stmt> | ε
Let’s get the FIRST and FOLLOW sets
FIRST(stmt) = {if}FIRST(stmt’)={else, ε}FOLLOW(stmt) = {$, else}FOLLOW(stmt’)={$, else}
The problem arises because for an input token else and stack top of stmt’, we do not know which production to choose:
<stmt’> else <stmt>
<stmt’> ε
Therefore, this is not an LL(1) grammar