[컴파일러] Chapter 3. Syntax Analysis(Part 2)

4학년 1학기 전공/컴파일러

[컴파일러] Chapter 3. Syntax Analysis(Part 2)

시데브 2025. 4. 14. 02:31

경희대학교 허선영 교수님의 컴파일러 수업을 기반으로 정리한 글입니다.

1. Shift-Reduce Parsing - LR(k) Parsing

predictive parser: 어떤 production이 사용될지 예측
오른쪽 항 전체에 해당하는 입력 토큰을 볼 때까지 결정을 미룲
LR(k)
- Left-to-right parse
- Rightmost derivation
- k-token lookahead
predictive parsing(LL aprsing)보다 더 많은 grammars에 대해서 parsing 가능

- 동작 방식

parser는 stack & input을 가짐
- stack contents & next input token에 의해서, 2가지 동작 중 하나가 결정됨
  - Shift: top of stack에 next input을 push
  - Reduce: choose a production X -> ABC
    - Pop off C, B, A
    - Push X

Initialization
- Stack is Empty
- Parset is at the beginnig of input stream
if $ is shifted -> input stream has been successfully parsed

- Example

-> 어떤 타이밍에 shift/reduce를 진행할지 결정해야 함

to make shift/reduce decisions -> parser uses deterministic finite automata(DFA)
- each state -> corresponds to the contents of stack at some time
- each edge -> is labeled with a symbol(terminal or non-terminal)
DFA를 사용하여 transition table을 build
- make decisions based on transition table

sn: shift in to state n
gn: goto state n
rk: reduce by rule k
a: accept

- Example

shift + state num
reduce + production num

- 4 Actions

Shift(n)
- 입력을 한 칸 앞으로 진행
- 현재 input token과 new state n을 stack에 push
Reduce(k)
- rule k의 우변 길이(symbols 개수) 만큼 stack을 pop
- goto n 수행
- 전이된 new state n을 stack에 push

Accept
- Stop parsing, report success
Error
- Stop parsing, report error

2. LR(k) Parsing

LR(k) parsing -> content of its stack & next k tokens 사용
- 실제로는 k>1은 사용되지 않음 -> parsing algorithm이 복잡해지고, memory도 많이 사용됨
- 대부분의 programming languages는 LR(1) grammars로 표현됨
LR(0) parsing
- lookahead를 아예 보지 않고, shift/reduce decisions 진행

2.1. LR(0) Parsing

stack is empty
input contains 'S' string followed by a '$'
parser position: parser가 보고있는 위치

- Basic Operations

Closure(I): parser position 오른쪽에 위치한 non-terminal로 시작하는 모든 productions를 LR(0) items를 나타내는 I에 추가 -> I가 변하지 않을 때까지

-> S로 시작되는 모든 productions 추가

Goto(I, X): 모든 items에서 parser position을 symbol X 이후로 이동하는 함수
- 현재 state I에서 symbol X를 읽었을 때, 다음 state는 무엇인지 알려줌
- item 집합 I 내부에서, parser position 바로 우측에 X가 있는 items만 골라서 해당 items의 parser position을 한 칸 오른쪽으로 이동 -> 해당 아이템들로 만든 집합에 Clousre 연산 적용

- Example(DFA 생성)

-> DFA 완성

- Building Transition Table

-> DFA 이용하여 작성

-> conflict 발생 시에, LR(0)로 parsing 불가능한 grammar

- 최종 Example

2.2. Simple LR Parsing

LR(0) parser를 기반으로 하지만, reduce할 때 FOLLOW 집합을 사용해 reduce의 범위를 제한

-> state3에서 기존에는 모든 terminals에 대해서 reduce 진행

-> SLR에서는 T의 follow인 +, $에 대해서만 reduce를 진행

-> r3에서도 마찬가지로 E의 follow인 $에 대해서만 reduce 진행

-> LR(0)로는 parsing 불가능, but SLR로는 parsing 가능한 gammar!!

2.3. LR(1) Parsing

LR(0): A의 follower를 모두 고려
SLR: 모든 terminal 고려
LR(1): lookahead symbol만 고려

LR(1) item은 2개의 components를 가짐
- Production: A -> α.β
  - parser position은 현재 분석이 어디까지 진행되었는지 나타냄
- Lookahead symbol: x
  - 아이템이 적용될 수 있는 다음 input symbol
ex) E -> T.+E, $ : T까지 분석을 완료했고, +E를 남겨두었으며, 해당 production은 input symbol이 $일 때 사용 가능하다는 뜻

- Basic Functions

Closure

-> lookahead symbol인 z까지 고려,

- β is empty: FIRST(βz) = FIRST(z)

- β is nullable: FIRST(βz) = FIRST(β) ∪ FIRST(z)

- β is not nullable: FIRST(β)만 확인

Goto

-> LR(0)에서는 단순히 parser position 이동 후 closure 적용

-> LR(1)에서는 parser position 이동 후 closure를 적용하는데, lookahead가 유지돼야 함

- Example

1) First State 빌드 - Closure 연산

- V -> x, $과 V -> x, =는 다른 것으로 취급!!

2) Goto

-> $ sign이 나오는 경우에만 reduce 연산 수행

-> 기존 LR(0), SLR이 수행하던 범위보다 축소

-> but, state 개수가 많아지고 구조가 복잡해짐

2.4. LALR(1)

LR(1)의 table 크기가 너무 커지는 문제
production이 같은데, lookahead symbol이 다른 경우를 merge
LALR(1): Look-Ahead LR(1)

2.5. Parsing Power

3. Real-world Parser Generators

Parser Generator: 문법만 정의해주면, 해당 문법에 맞는 parser를 자동으로 생성해주는 프로그램
- yacc, bison: LALR parser generators for C
Parser Generator Specification
- Input: a set of context-free grammars specifying a parser
  - context-free grammar: 문맥에 상관 없이 어떤 Non-terminal이 들어와도, 그에 대한 production-rule이 존재하는 문법
- Outputs
  - a parser in target language -> 파서 코드 자동 생성
  - a description of state machine -> 어떤 state에서 어떤 symbol을 만나면 어떻게 전이되는지
- Rules: pattern & action
  - pattern: context free grammar
  - action: 문법이 만족되었을 때 실행할 code fragment
  - example) exp: exp PLUS exp { $$ = $1 + $2; }
    - exp PLUS exp라는 pattern -> { $$ = $1 + $2 ; }라는 코드 실행

- Parser Generator Example: GNU Bison

-> yyparse() 호출로 파서 실행

User Declaration
- rules section에서 사용가능한 various values 정의
- terminal, non-terminal symbols, and thier types 선언
  - %token NUM
  - %token <type> NUM
  - %token <type> exp
- precedences(우선순위)선언 -> shift-reduce conflicts 해결
- (optionally) specify start symbol
  - %start prog
Rules
- pattern & action -> pattern에 따라 어떤 action을 취할지
Location
- @$: location of the whole group
- @1: location of the first subexpression

4. Ambigous Grammar

Ambiguous Grammar: 같은 expression에 대해서 2개 이상의 parse tree가 존재하는 grammar
example) 4 + 5 * 6

4.1. Handling Ambiguous Grammar

YACC -> shift-reduce conflicts로 알림
- 4 + 5 * 6
  - stack에 expr + expr이 쌓인 상태에서, current token으로 *가 들어옴
  - parser: expr -> expr + expr rule을 통해서 reduce or shift
  - Prefer shift
- 4 + 5 + 6
  - stack에 expr + expr이 쌓인 상태에서, current token으로 +가 들어옴
  - parser: expr -> expr + expr rule을 통해서 reduce or shift
  - Prefer reduce

-> shift 이후, 위에부터 reduce 진행하니까, 6 * 5를 먼저 reduce

4.2. Directives

Three Solutions
1. Let YACC complain, but check if the choice (shift) is correct
2. Rewrite grammar to eliminate ambiguity
3. Keep grammar, but add precedence directives -> conflicts가 해결되는 방향으로
  - %left, %right, %nonassoc
  - for this grammar -> 라인의 순서가 낮으면 우선순위가 높음
    - $left PLUS MINUS -> PLUS MINUS 동등
    - $left MULT DIV -> MULT DIV 동등
주어진 directives에 따라, YACC은 each terminal & rule에 precedence 부여
- rightmost terminal과 shift할 terminal의 precedence 비교

주어진 shift-reduce conflict에 따라, YACC는 action 수행
- reduce 수행할 rule과 shift 수행할 terminal의 precedence find
  - prec(terminal) > prec(rule): shift
  - prec(terminal) < prec(rule): reduce
  - prec(terminal) = prec(rule)
    - assoc(terminal) = left : reduce
    - assoc(terminal) = right: shift
    - assoc(terminal) = nonassoc : report error -> 결합 불가(오류 처리)
    - associativity(결합법칙): 연산자끼리 우선순위가 같을 때, 어느 방향으로 묶을 것인지 결정하는 규칙

4.3. Dangling Else Problem

Valid Program: if a then if b then S1 else S2

-> if a then if b then s1 else s2 동작을 원하지만, S1이 stack의 top에 위치할 때 shift-reduce conflict 발생

solution: eliminate ambiguity by modifying grammar(matched/unmatched)

4.4. Default Behavior

Directives not specified
- shift-reduce: report error, shift by default
- reduce-reduce: report error, reduce by rule that occurs first
What to do
- shift-reduce: acceptable in well defined class, 그러나 웬만하면 없애는 것이 좋음
- reduce-reduce: unacceptable, rewrite grammar

4.5. %prec Directive

commonly used for the unary minus problem
- %left PLUS MINUS
- %left MULT DIV
consider - 4 * 6
unary minus가 우선순위가 높을 것을 권장, but MINUS의 precedence가 MULT보다 낮음
- -( 4 * 6 ) not ( -4 ) * 6
solution
- %token NUM PLUS MINUS MULT DIV UMINUS
- %left PLUS MINUS
- %left MULT DIV
- %left UMINUS
- expr: MINUX expr %prec UMINUS {}
  | expr PLUS expr {}

참고자료

https://velog.io/@legionary7931/%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%98%EB%B0%8D%EC%96%B8%EC%96%B4-Context-Free-grammar

[프로그래밍언어] - Context-Free grammar

📜 Parser * 📢 Parser의 역할* Parser는 scanner로부터 tokenized 된 word stream을 받아 syntax를 확인하고, 올바른 syntax라면 중간 표현(Intermediate Representation, IR)을 도출해내는 컴파

velog.io

728x90