Lesson 0.1

How Compilers Work

The 10-minute mental model. After this, you'll understand every step your code goes through to become a running program.

10 min read No code required

A compiler is a translator.

Think about translating a book from Urdu to English. You'd do it in steps:

Read the words

Break text into individual words and punctuation. You don't care about meaning yet — just identifying each piece.

Understand grammar

Figure out which words form sentences. Which is the subject, which is the verb, which modifies what.

Check meaning

Does this sentence make sense? Can the subject actually do this action? Are there contradictions?

Write in English

Express the same meaning in the target language. Same ideas, different words.

That's a compiler. These four steps, applied to programming languages instead of human languages. Monk source goes in, C source comes out. Same ideas, different syntax.

The four steps, concretely

Let's follow a single line of Monk code through the entire compiler:

let x = 1 + 2 * 3

Step 1: Lexer

Read characters, produce tokens

The lexer reads the source code character by character and groups them into tokens — the smallest meaningful units. It's like reading a sentence and identifying each word.

Input: raw text

l e t x = 1 + 2 * 3

Output: tokens

LET IDENT: x EQUALS NUM: 1 PLUS NUM: 2 STAR NUM: 3

The lexer doesn't care about meaning. It just knows let is a keyword, x is a name, 1 is a number, and + is an operator. Whitespace and comments are discarded.

Step 2: Parser

Read tokens, produce a tree

The parser reads the token stream and builds a tree that represents the structure of the program. This is called an Abstract Syntax Tree (AST).

The critical job: operator precedence. The parser knows that 2 * 3 must be computed before 1 + .... The tree's shape encodes this.

Output: Abstract Syntax Tree

Notice the tree shape. Multiply is nested inside Add, deeper in the tree. When we evaluate bottom-up, 2 * 3 = 6 happens first, then 1 + 6 = 7. The tree structure IS the precedence.

Step 3: Type Checker

Walk the tree, verify types make sense

The type checker walks the AST and verifies that every operation is valid. It annotates each node with its type.

2 * 3 — int * int = int. Valid.

1 + 6 — int + int = int. Valid.

let x = ... — x has no annotation, inferred as int. Valid.

Now imagine a mistake:

let x int = "hello"

Error: cannot assign string to int variable

The type checker catches this before the program runs. Better to find the bug now than in production.

Step 4: Code Generator

Walk the tree, write C

The code generator walks the type-checked AST and outputs C source code. Each node type has a translation rule.

Monk

let x = 1 + 2 * 3

Generated C

int64_t x = 1 + 2 * 3;

A more complex example — a function:

Monk

const add = (a int, b int) int {
    return a + b
}
show(add(3, 4))

Generated C

int64_t add(int64_t a, int64_t b) {
    return a + b;
}
monk_show(add(3, 4));

The generated C is compiled by your system's C compiler (GCC, Clang) into a native binary. Done.

The complete picture

you write

hello.monk

lexer

Stream of tokens: LET, IDENT, EQUALS, NUM, ...

parser

Tree of nodes: VariableDecl → BinaryExpr → ...

checker

Same tree, now every node knows its type

codegen

hello.c

hello (native binary)

What Monk does NOT do

No bytecode or virtual machine.

Source goes straight to C. No intermediate bytecode format. No VM to maintain.

No runtime interpreter.

The output is a standalone native binary. No Monk runtime needed to execute it.

No LLVM.

LLVM is a 100+ MB dependency. We generate C instead and let your system's C compiler do the heavy lifting. LLVM can be added later as an optional optimization backend.

Why no REPL? Monk is purely a compiler. There's no interactive mode — you write a .monk file, compile it, run the binary. This keeps the toolchain simple: one path, one output format, no interpreter to maintain.

The files you'll write

The Monk compiler is built in Go. Here's what each package does:

syntax/ Lexer + Parser + AST (scanner, parser, ast, tokens)

codegen/ AST to C source code emitter

cmd/monk/ CLI entry point (monk build, monk run, monk check)

runtime/ Small C library linked into every Monk binary

runtime.h MonkValue, function signatures

runtime.c show(), math, string ops, array ops, error handling

Key takeaways

A compiler is four steps: lex (words), parse (grammar), check (meaning), generate (translate).

Each step transforms a simpler representation into a richer one. Characters to tokens to trees to typed trees to C code.

The AST's tree shape encodes operator precedence. Deeper nodes are evaluated first.

Monk compiles to C, then C compiles to native. No VM, no GC, no runtime.