[컴파일러] Assignment 1. Lexical Analysis

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

SYDev

[컴파일러] Assignment 1. Lexical Analysis 본문

4학년 1학기 전공/컴파일러

[컴파일러] Assignment 1. Lexical Analysis

시데브 2025. 4. 12. 14:48

경희대학교 허선영 교수님의 컴파일러 수업을 기반으로 정리한 글입니다.

Goals of this Assignment

Build a lexical analyzer for a new language, using a lexer generator for OCaml(ocamllex)
know how to write a regular expression to represent a given set of strings
Understand how to write and debug ocamllex specification

ocamllex

ocamllex: lexer generator for OCaml
ocamllex specification consists of three parts as shown in the example in Snippet 1.
1. User Declarations: auxiliary(보조자) values & types 작성
  - Line 0 to 5
2. Lex Definitions: regular expression 정의
  - Line 7
  - Format: let ident = regexp ...
3. Rules: rules for lexical anlysis 포함
  - Line 9 to 11
  - Format: rule entrypoint [arg1 ... argn] = parse regexp { action } | ...

ocamllex provides predefined special operations for lexical analysis
- Lexing.lexeme lexbuf: return the matched string
- Lexing.lexeme_char lexbuf n: return the n-th character in the matched string
- Lexing.lexeme_start lexbuf: return the start position of the matched string
- Lexing.lexeme_end lexbuf: return the end position of the matched string
in addition, ocammlex allowss to define multiple entry points for rules
- can switch to another entry point by calling entrypoint lexbuf.

The Fun Language

Fun: new programming language introducedd in this course
goal of the assignments: to build a compiler for the Fun language

- Specification of the language

Reserved words: fun, in, let, while, do, if, then, else, ref, not
Identifiers
- An identifier is a sequence of letters, digits, and underscores, starting with a letter
- Upper-case letters are distinguished from lower-case letters
Numbers
- Each number is a sequence of digits
- minus sign should be lexed as a separate token
Other Tokens

Comments: start with /* and end with */
Whitespaces: Newlines(\n), carriage-returns(\r), tabs(\t)

Lexical Analyzer for Fun

(* CSE322 Compiler Assignment 1 *)
{
exception Eof

(* create a tuple of the start and end positions *)
let make_pos buf = (Lexing.lexeme_start buf, Lexing.lexeme_end buf)
}

let alpha = ['A'-'Z''a'-'z']
let digit = ['0'-'9']
let projnum = '0' | ['1'-'9'] digit*
let ident = alpha ( alpha | digit | '_')*

rule initial = parse
  | "fun"  { Tokens.tok_fun(make_pos lexbuf) }
  | "->"   { Tokens.tok_arrow(make_pos lexbuf) }
  | "in"   { Tokens.tok_in(make_pos lexbuf) }
  | "let"  { Tokens.tok_let(make_pos lexbuf) }
  | "else" { Tokens.tok_else(make_pos lexbuf) }
  | "then" { Tokens.tok_then(make_pos lexbuf) }
  | "if"   { Tokens.tok_if(make_pos lexbuf) }
  | "assign"   { Tokens.tok_assign(make_pos lexbuf) }
  | "!"    { Tokens.tok_bang(make_pos lexbuf) }
  | "ref"  { Tokens.tok_ref(make_pos lexbuf) }
  | "do"   { Tokens.tok_do(make_pos lexbuf) }
  | "while"    { Tokens.tok_while(make_pos lexbuf) }
  | "||"   { Tokens.tok_or(make_pos lexbuf) }
  | "not"  { Tokens.tok_not(make_pos lexbuf) }
  | "&"    { Tokens.tok_and(make_pos lexbuf) }
  | ">"    { Tokens.tok_gt(make_pos lexbuf) }
  | "="    { Tokens.tok_eq(make_pos lexbuf) }
  | "<"    { Tokens.tok_lt(make_pos lexbuf) }
  | "*"    { Tokens.tok_times(make_pos lexbuf) }
  | "-"    { Tokens.tok_minus(make_pos lexbuf) } 
  | "+"    { Tokens.tok_plus(make_pos lexbuf) }
  | ")"    { Tokens.tok_rparen(make_pos lexbuf) }
  | "("    { Tokens.tok_lparen(make_pos lexbuf) }
  | ":"    { Tokens.tok_colon(make_pos lexbuf) }
  | ";"    { Tokens.tok_semicolon(make_pos lexbuf) }
  | ","    { Tokens.tok_comma(make_pos lexbuf) }

  | "#" projnum as s   { Tokens.tok_proj(make_pos lexbuf, int_of_string (String.sub s 1 (String.length s - 1))) } 

  | ident as id { Tokens.tok_id(make_pos lexbuf, id) } 
  | digit+ as num { Tokens.tok_num(make_pos lexbuf, int_of_string num) } (* num 처리 *)

  | [' ' '\t' '\r'] { initial lexbuf }
  | "\n" { Errormsg.new_line(make_pos lexbuf); initial lexbuf }

  | "/*" { comment 1 lexbuf }

  | eof    { raise Eof }

  | _ {
      let msg = Printf.sprintf "Unexpected character: %s" (Lexing.lexeme lexbuf) in
      Errormsg.impossible msg
    }

and comment depth = parse
  | "/*" { comment (depth + 1) lexbuf }
  | "*/" { if depth = 1 then initial lexbuf else comment (depth - 1) lexbuf }
  | eof  { Errormsg.impossible "Unclosed comment" }
  | _    { comment depth lexbuf }

728x90

'4학년 1학기 전공 > 컴파일러' 카테고리의 다른 글

[컴파일러] Chapter 3. Syntax Analysis(Part 2) (0)	2025.04.14
[컴파일러] Chapter 3. Syntax Analysis(Part 1) (0)	2025.04.12
[컴파일러] Assignment 0. Your First Functional Programming (1)	2025.03.21
[컴파일러] Chapter 2. Lexical Analysis (0)	2025.03.20
[컴파일러] Chapter 1. Introduction (0)	2025.03.18

'4학년 1학기 전공/컴파일러' Related Articles

SYDev

[컴파일러] Assignment 1. Lexical Analysis 본문

[컴파일러] Assignment 1. Lexical Analysis

Goals of this Assignment

ocamllex

The Fun Language

- Specification of the language

Lexical Analyzer for Fun

'4학년 1학기 전공 > 컴파일러' 카테고리의 다른 글

티스토리툴바