1️⃣ CS Sem 1 2️⃣ CS Sem 2 3️⃣ CS Sem 3 4️⃣ CS Sem 4 5️⃣ CS Sem 5 6️⃣ CS Sem 6 💡 IT Branch 📡 ECE Branch 🏫 Class 9 🎒 Class 10 🔬 Class 11 🧪 Class 12 🎓 MCA / PG 📜 PhD / Research

Computer ScienceSEM-6Compiler Design

Compiler Design — Complete Notes CS Sem 6

Q: Compiler aur Interpreter mein kya difference hai?

Compiler poore program ko ek saath translate karta hai machine code mein (faster execution). Interpreter line-by-line execute karta hai (slower, easier debugging). Java both use karta hai — compiler → bytecode, JVM interprets.

Q: Token, Lexeme aur Pattern mein kya difference hai?

Lexeme — actual string in source (e.g., 'count'). Token — category of lexeme (e.g., IDENTIFIER). Pattern — rule that describes a token (regex: [a-zA-Z][a-zA-Z0-9]*).

Q: LL aur LR parser mein kya difference hai?

LL parser — Left to right scan, Leftmost derivation (top-down). LR parser — Left to right scan, Rightmost derivation (bottom-up). LR is more powerful.

✍️ WohoTech Team📅 Last Updated: 2026-03-11📄 52 pages · 2.4 MB

Compiler Overview

A compiler translates source code (high-level) to target code (machine/assembly).

Compiler — Phases of Compilation

Two major phases:

Analysis (Front-end): Source → IR (phases 1-4)
Synthesis (Back-end): IR → Target code (phases 5-7)

Phase 1: Lexical Analysis

Reads source code character by character → produces tokens.

Source:  int count = 0;
Tokens:  [int, KEYWORD] [count, IDENTIFIER] [=, ASSIGN] [0, INTEGER] [;, SEMICOLON]

Token types: Keywords, Identifiers, Literals, Operators, Punctuation, Comments (ignored)

Tools: Lex, Flex

Regular Expressions:

Identifier:  [a-zA-Z_][a-zA-Z0-9_]*
Integer:     [0-9]+
Float:       [0-9]+\.[0-9]+

Finite Automata (DFA):

DFA — deterministic, one transition per state per symbol
NFA → DFA conversion via subset construction

Phase 2: Syntax Analysis (Parsing)

Tokens → Parse Tree / AST checking grammatical structure.

Context-Free Grammar (CFG):

S → if E then S else S
S → id = E
E → E + T | T
T → T * F | F
F → (E) | id | num

Derivations:

Leftmost derivation: always expand leftmost non-terminal
Rightmost derivation: always expand rightmost non-terminal

Top-Down Parsing

Starts from start symbol, builds tree downward.

Recursive Descent — one function per non-terminal
LL(1) — uses lookahead of 1 token, FIRST and FOLLOW sets

FIRST(A) = set of terminals that begin strings derivable from A
FOLLOW(A) = set of terminals that can appear after A

Bottom-Up Parsing

Starts from tokens, reduces to start symbol.

LR(0), SLR(1), LALR(1), CLR(1) — use shift-reduce automaton
LALR(1) — most commonly used in practice (Yacc, Bison)

Phase 3: Semantic Analysis

Checks meaning and type consistency.

int a = "hello";    // type mismatch error
b = a + c;          // undeclared variable error
int add(int x) { return x + 1; }
add(1, 2, 3);       // wrong number of arguments

Symbol Table — stores identifier info (name, type, scope, address)

Type Checking:

Static type checking (compile time) — C, Java
Dynamic type checking (runtime) — Python, JavaScript

Attribute Grammar — extends CFG with semantic rules

Phase 4: Intermediate Code Generation

Translates AST to platform-independent IR.

Three-Address Code (TAC):

a = b + c * d   →   t1 = c * d
                    t2 = b + t1
                    a  = t2

Quadruples: (operator, arg1, arg2, result)
Triples: (operator, arg1, arg2) — result is implicit
SSA Form — Static Single Assignment (each variable assigned once)

Phase 5: Code Optimization

Improve IR without changing semantics.

Machine-independent optimizations:

Constant Folding — 2 + 3 → 5 at compile time
Dead Code Elimination — remove unreachable code
Common Subexpression Elimination (CSE) — a+b computed once
Copy Propagation — replace x = y; z = x with z = y
Loop Invariant Code Motion — move code out of loops
Strength Reduction — x*2 → x+x or x<<1

Machine-dependent:

Register allocation
Instruction selection
Instruction scheduling

Phase 6: Code Generation

IR → Target machine code/assembly.

Tasks:

Instruction selection (IR → ASM)
Register allocation — limited registers, use wisely
Instruction ordering

Register Allocation:

Graph Coloring — interference graph, K-coloring problem
Spilling — when not enough registers, use memory

Runtime Environments

Activation Record (Stack Frame):

High address ─────────────────
  Actual parameters
  Return address
  Saved registers
  Local variables
  Temporaries
Low address  ─────────────────

Storage allocation:

Static — global vars, fixed at compile time
Stack — local vars, function calls
Heap — dynamic allocation (malloc/new)

Sorting Algorithms (used in compiler optimizations)

Sorting Algorithms — Time & Space Complexity

📄 Download Complete PDF Notes

Compiler Design notes covering all phases — lexical analysis, parsing, semantic analysis, code generation, optimization with diagrams for B.Tech CS Sem 6.

52 pages · 2.4 MB · Updated 2026-03-11

Free Download ↓

❓ Frequently Asked Questions

Compiler aur Interpreter mein kya difference hai?▾

Compiler poore program ko ek saath translate karta hai machine code mein (faster execution). Interpreter line-by-line execute karta hai (slower, easier debugging). Java both use karta hai — compiler → bytecode, JVM interprets.

Token, Lexeme aur Pattern mein kya difference hai?▾

Lexeme — actual string in source (e.g., 'count'). Token — category of lexeme (e.g., IDENTIFIER). Pattern — rule that describes a token (regex: [a-zA-Z][a-zA-Z0-9]*).

LL aur LR parser mein kya difference hai?▾

LL parser — Left to right scan, Leftmost derivation (top-down). LR parser — Left to right scan, Rightmost derivation (bottom-up). LR is more powerful.

📌 Related Notes

CSSEM-6

Machine Learning Complete Notes — B.Tech CS Sem 6

Machine Learning

CSSEM-6

Software Engineering — Complete Notes with SDLC, Agile, Testing

Software Engineering

CSSEM-4

DBMS Complete Notes — B.Tech CS Sem 4

Database Management Systems

CSSEM-1

Engineering Mathematics 1 — Calculus, Matrices, Differential Equations

Engineering Mathematics 1

CSSEM-1

Programming Fundamentals Using C — Complete Notes

Programming Fundamentals (C)

Was this helpful?

Your feedback helps us improve notes and tutorials.

Computer ScienceSEM-6Compiler Design

Compiler Design — Complete Notes CS Sem 6

✍️ WohoTech Team📅 Last Updated: 2026-03-11📄 52 pages · 2.4 MB

Compiler Overview

A compiler translates source code (high-level) to target code (machine/assembly).

Compiler — Phases of Compilation

Two major phases:

Analysis (Front-end): Source → IR (phases 1-4)
Synthesis (Back-end): IR → Target code (phases 5-7)

Phase 1: Lexical Analysis

Reads source code character by character → produces tokens.

Source:  int count = 0;
Tokens:  [int, KEYWORD] [count, IDENTIFIER] [=, ASSIGN] [0, INTEGER] [;, SEMICOLON]

Token types: Keywords, Identifiers, Literals, Operators, Punctuation, Comments (ignored)

Tools: Lex, Flex

Regular Expressions:

Identifier:  [a-zA-Z_][a-zA-Z0-9_]*
Integer:     [0-9]+
Float:       [0-9]+\.[0-9]+

Finite Automata (DFA):

DFA — deterministic, one transition per state per symbol
NFA → DFA conversion via subset construction

Phase 2: Syntax Analysis (Parsing)

Tokens → Parse Tree / AST checking grammatical structure.

Context-Free Grammar (CFG):

S → if E then S else S
S → id = E
E → E + T | T
T → T * F | F
F → (E) | id | num

Derivations:

Leftmost derivation: always expand leftmost non-terminal
Rightmost derivation: always expand rightmost non-terminal

Top-Down Parsing

Starts from start symbol, builds tree downward.

Recursive Descent — one function per non-terminal
LL(1) — uses lookahead of 1 token, FIRST and FOLLOW sets

FIRST(A) = set of terminals that begin strings derivable from A
FOLLOW(A) = set of terminals that can appear after A

Bottom-Up Parsing

Starts from tokens, reduces to start symbol.

LR(0), SLR(1), LALR(1), CLR(1) — use shift-reduce automaton
LALR(1) — most commonly used in practice (Yacc, Bison)

Phase 3: Semantic Analysis

Checks meaning and type consistency.

int a = "hello";    // type mismatch error
b = a + c;          // undeclared variable error
int add(int x) { return x + 1; }
add(1, 2, 3);       // wrong number of arguments

Symbol Table — stores identifier info (name, type, scope, address)

Type Checking:

Static type checking (compile time) — C, Java
Dynamic type checking (runtime) — Python, JavaScript

Attribute Grammar — extends CFG with semantic rules

Phase 4: Intermediate Code Generation

Translates AST to platform-independent IR.

Three-Address Code (TAC):

a = b + c * d   →   t1 = c * d
                    t2 = b + t1
                    a  = t2

Quadruples: (operator, arg1, arg2, result)
Triples: (operator, arg1, arg2) — result is implicit
SSA Form — Static Single Assignment (each variable assigned once)

Phase 5: Code Optimization

Improve IR without changing semantics.

Machine-independent optimizations:

Constant Folding — 2 + 3 → 5 at compile time
Dead Code Elimination — remove unreachable code
Common Subexpression Elimination (CSE) — a+b computed once
Copy Propagation — replace x = y; z = x with z = y
Loop Invariant Code Motion — move code out of loops
Strength Reduction — x*2 → x+x or x<<1

Machine-dependent:

Register allocation
Instruction selection
Instruction scheduling

Phase 6: Code Generation

IR → Target machine code/assembly.

Tasks:

Instruction selection (IR → ASM)
Register allocation — limited registers, use wisely
Instruction ordering

Register Allocation:

Graph Coloring — interference graph, K-coloring problem
Spilling — when not enough registers, use memory

Runtime Environments

Activation Record (Stack Frame):

High address ─────────────────
  Actual parameters
  Return address
  Saved registers
  Local variables
  Temporaries
Low address  ─────────────────

Storage allocation:

Static — global vars, fixed at compile time
Stack — local vars, function calls
Heap — dynamic allocation (malloc/new)

Sorting Algorithms (used in compiler optimizations)

Sorting Algorithms — Time & Space Complexity

📄 Download Complete PDF Notes

Compiler Design notes covering all phases — lexical analysis, parsing, semantic analysis, code generation, optimization with diagrams for B.Tech CS Sem 6.

52 pages · 2.4 MB · Updated 2026-03-11

Free Download ↓

❓ Frequently Asked Questions

Compiler aur Interpreter mein kya difference hai?▾

Token, Lexeme aur Pattern mein kya difference hai?▾

Lexeme — actual string in source (e.g., 'count'). Token — category of lexeme (e.g., IDENTIFIER). Pattern — rule that describes a token (regex: [a-zA-Z][a-zA-Z0-9]*).

LL aur LR parser mein kya difference hai?▾

LL parser — Left to right scan, Leftmost derivation (top-down). LR parser — Left to right scan, Rightmost derivation (bottom-up). LR is more powerful.