Loading...
Loading...
A compiler translates source code (high-level) to target code (machine/assembly).
Two major phases:
Reads source code character by character → produces tokens.
Source: int count = 0;
Tokens: [int, KEYWORD] [count, IDENTIFIER] [=, ASSIGN] [0, INTEGER] [;, SEMICOLON]
Token types: Keywords, Identifiers, Literals, Operators, Punctuation, Comments (ignored)
Tools: Lex, Flex
Regular Expressions:
Identifier: [a-zA-Z_][a-zA-Z0-9_]*
Integer: [0-9]+
Float: [0-9]+\.[0-9]+
Finite Automata (DFA):
Tokens → Parse Tree / AST checking grammatical structure.
Context-Free Grammar (CFG):
S → if E then S else S
S → id = E
E → E + T | T
T → T * F | F
F → (E) | id | num
Derivations:
Starts from start symbol, builds tree downward.
FIRST(A) = set of terminals that begin strings derivable from A
FOLLOW(A) = set of terminals that can appear after A
Starts from tokens, reduces to start symbol.
Checks meaning and type consistency.
int a = "hello"; // type mismatch error
b = a + c; // undeclared variable error
int add(int x) { return x + 1; }
add(1, 2, 3); // wrong number of arguments
Symbol Table — stores identifier info (name, type, scope, address)
Type Checking:
Attribute Grammar — extends CFG with semantic rules
Translates AST to platform-independent IR.
Three-Address Code (TAC):
a = b + c * d → t1 = c * d
t2 = b + t1
a = t2
Quadruples: (operator, arg1, arg2, result)
Triples: (operator, arg1, arg2) — result is implicit
SSA Form — Static Single Assignment (each variable assigned once)
Improve IR without changing semantics.
Machine-independent optimizations:
2 + 3 → 5 at compile timea+b computed oncex = y; z = x with z = yx*2 → x+x or x<<1Machine-dependent:
IR → Target machine code/assembly.
Tasks:
Register Allocation:
Activation Record (Stack Frame):
High address ─────────────────
Actual parameters
Return address
Saved registers
Local variables
Temporaries
Low address ─────────────────
Storage allocation:
Compiler Design notes covering all phases — lexical analysis, parsing, semantic analysis, code generation, optimization with diagrams for B.Tech CS Sem 6.
52 pages · 2.4 MB · Updated 2026-03-11
Compiler poore program ko ek saath translate karta hai machine code mein (faster execution). Interpreter line-by-line execute karta hai (slower, easier debugging). Java both use karta hai — compiler → bytecode, JVM interprets.
Lexeme — actual string in source (e.g., 'count'). Token — category of lexeme (e.g., IDENTIFIER). Pattern — rule that describes a token (regex: [a-zA-Z][a-zA-Z0-9]*).
LL parser — Left to right scan, Leftmost derivation (top-down). LR parser — Left to right scan, Rightmost derivation (bottom-up). LR is more powerful.
Machine Learning Complete Notes — B.Tech CS Sem 6
Machine Learning
Software Engineering — Complete Notes with SDLC, Agile, Testing
Software Engineering
DBMS Complete Notes — B.Tech CS Sem 4
Database Management Systems
Engineering Mathematics 1 — Calculus, Matrices, Differential Equations
Engineering Mathematics 1
Programming Fundamentals Using C — Complete Notes
Programming Fundamentals (C)
Your feedback helps us improve notes and tutorials.