Compilers@di.uminho.pt

Concepts

LL(k) ---- LL(1) Parsers

LR(k) ---- LR(1) Parsers

LR(0) ---- SLR(1) Parsers

LALR Parsers

LR vs. LL Parsers

Language

Lexical Analysis

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

LL(k) ---- LL(1) Parsers

LL(k) - Left to right, Leftmost derivation with k lookahead symbols

An LL(k) parser is a top-down parser, that is, it decides which production to use by looking at the next k tokens. This means that an LL(k) grammar must not have a point where the alternative productions have the same prefix, and that a production cannot be left-recursive - the left hand side must always resolve to a terminal. In addition, an LL(k) grammar must have a fixed value of k. Most LL(k) parsers are actually LL(1) parsers that use lookahead trees where it's necessary to disambiguate rules.

LL(k) parser generators

Modern parser generators that generate LL parsers with multi-token lookahead include:
1. ANTLR (old PCCTS)
2. Coco/R
3. JavaCC
4. Parsec
5. SLK
6. Spirit Parser Framework

Please give a look at following document (definitions over LL(1) parsers and algorithm to construct table based LL(1)): Algorithm.pdf

Example of a LL(1) grammar and his table: LittleGarden.pdf

Go up

LR(k) ---- LR(1) Parsers

LR - Left-to-right parse, Rightmost derivation. The LR(k) class of grammars is weaker than the context-free grammar class, but is still extremely expressive. In contrast to the LL(k) parsers where k lookahead symbols must uniquely predict which alternative to attempt (i.e., on the left edge), the LR(k) parsers make decisions at the right edge, after having seen the entire production.

LR(k) recognizers - and their variants such as LALR(k) - are stronger than LL(k) recognizers because the LR strategy uses more context information. For an LR parser, the context consists of all grammar productions consistent with the previously seen input. This context often includes several "pending" grammar productions. Intuitively, an LR(k) parser attempts to match multiple productions at the same time and postpones making a decision until sufficient input has been seen. In contrast, the context for an LL parser is restricted to the sequence of previously matched productions and the position within the current grammar production being matched. An LL(k) parser must make decisions about which production to match without having seen any portion of the pending productions - it has access to less context information. Hence, LL(k) parsers rely heavily on lookahead.

The LR(k) Parser is a standard iterative algorithm that in each iteration decides the action to perform based on the current (analysis) state and on the next input (terminal) symbol under processing. That decision is taken according to the transition function of a pushdown deterministic finite automaton built systematically from the grammar. The transition function is kept in memory splitted into 2 tables: the ACTION table is indexed by State x Terminal and each entry has a compound value (the type of action to be performed - accept, reject (error), shift, reduce - and the next State or the Production-number); the GOTO table is indexed by State x Non-Terminal and each entry has a compound value (the type of action to be performed - shift - and the next State). The state, after a "shift" action, is pushed onto the LR(k) parsing stack; the state on the top will be the next state, to be considered in the next iteration. During a "reduce" action states are poped out of the stack, as many as the number of symbols in the right hand side of the production to be reduced.

Go up

LR(0) ---- SLR(1) Parsers

Please give a look at following example: XMLDocument.pdf

Go up

LALR Parsers

LALR - Look Ahead Left-to-right parse, Rightmost-derivation.

LALR parsers are a specialized form of LR parsers that can deal with more context-free grammars than Simple LR parsers but less than LR(1) parsers can.

LALR parsers were introduced as practical parsers for deterministic languages. Rather than building an exponential number of LR(k) states, LALR(k) parsers add lookahead sets to the actions of the small LR(0) parser. While SLR uses follow sets to construct reduce actions, LALR uses lookahead sets, which are more specific because they take more of the parsing context into account.

In LR(1) parsing, an item A ::= a (s1) is different from A ::= a (s2) if s1 is different from s2. This results to a large number of states since the combinations of expected lookahead symbols can be very large. To reduce the number of states, when we have two items like those two, instead of creating two states (one for each item), we combine the two states into one by creating an item A := a (s3) where s3 is the union of s1 and s2. Since we make the expected lookahead sets larger, there is a danger that some conflicts will have worse chances to be resolved. But the number of states we get is the same as that for LR(0). This grammar is called LALR(1).

LALR parser generators:

1. AnaGram
2. BEAVER
3. CUP
4. ELI
5. Gold
6. Lemon
7. LISA
8. Ulm's Modula-2 System
9. YACC
10. YaYacc

Go up

LR vs. LL Parsers

An LL parser is much easier to understand than an LALR parser, easier to write and debug and it has better error recovery semantics. A lot of non-LL grammars can be easily munged into LL(k)-form, in a lot of cases, where k=1. LL parser generators tend to produce smaller tables than LALR parsers.

Go up

Language

Given an alphabet T of terminal symbols (or words), a language is a set of strings, or sequences of symbols of T, called sentences.
A language is defined by a grammar.

See also: Grammar.

Go up

Lexical Analysis

Lexical analysis is the compiler's phase in which the stream of characters, making up the source program, is read from left-to-right and grouped into tokens for terminal symbols that are sequences of characters having a collective meaning.

Go up