Syntax Analysis
Context free grammars
A Context Free Grammar (CFG) consists of
- Terminals
- Non-terminals
- Start symbol
- Productions
A language that can be generated by a grammar is said to be a context-free language.
Terminals
They are the basics symbols from which strings are formed. These are the tokens that were produced by the Lexical Analyser
Non-terminals
They are syntactic variables that denote sets of strings. One non-terminal is distinguished as the start symbol.
The productions of a grammar specify the manner in which the terminal and non-terminals can be combined to form strings.
Derivations
<expr> = <expr> <op> <expr>
This is read as "expr derives expr op expr".
<expr> = <expr> <op> <expr>= id <op> <expr>= id * <expr>= id*id
This is called a derivation of id*id from expr.
If is a production and and are arbitrary strings of grammar symbols we say:
If , we say derives .
means "derives in one step"
means "derives in zero or more steps."
means "derives in one or more steps."
is grammar
is the start symbol
is the language generated by
Strings in may contain only terminal symbols of . A string of terminals is said to be in L(G) if and only if .
A language that can be generated by a grammar is said to be a context-free language. If two grammars generate the same language, the grammars are said to be equivalent.
Parse Trees
expr|| | |expr op expr| | |id + id
This is called: Leftmost derivation
Precedence
A problem with parse trees is that it does not enforce precedence. However we can enhance production rules to add precedence.
<expr> ::= <expr> + <term>| <expr> - <term>| <term><term> ::= <term> * <factor>| <term> / <factor>| <factor><factor> ::= num| id
Ambiguity
A grammar that produces more than one parse tree for some sentence is said to be ambiguous.
Eliminating Ambiguity
Sometimes ambiguous grammar can be rewritten to eliminate the ambiguity.
<stmt> ::= <matched>| <unmatched><matched> ::= if <expr> then <matched> else <matched>| other stmts<unmatched> ::= if <expr> then <stmt>| if <expr> then <matched> else <unmatched>
Top-down parsing
A top-down parser starts with the root of the parse tree, labelled with the start or goal symbol of the grammar.
To build a parse tree, we repeat the following steps until the leaves of the parse tree match the input string.
- At a node labelled A, select a production A ::= and construct the appropriate child for each symbol of
- When a terminal is added to the parse tree that does not match the input string, backtrack
- Find the next non-terminal to be expanded.