Lesson 0.1
How Compilers Work
The 10-minute mental model. After this, you'll understand every step your code goes through to become a running program.
A compiler is a translator.
Think about translating a book from Urdu to English. You'd do it in steps:
Read the words
Break text into individual words and punctuation. You don't care about meaning yet — just identifying each piece.
Understand grammar
Figure out which words form sentences. Which is the subject, which is the verb, which modifies what.
Check meaning
Does this sentence make sense? Can the subject actually do this action? Are there contradictions?
Write in English
Express the same meaning in the target language. Same ideas, different words.
That's a compiler. These four steps, applied to programming languages instead of human languages. Monk source goes in, C source comes out. Same ideas, different syntax.
The four steps, concretely
Let's follow a single line of Monk code through the entire compiler:
let x = 1 + 2 * 3 Step 1: Lexer
Read characters, produce tokens
The lexer reads the source code character by character and groups them into tokens — the smallest meaningful units. It's like reading a sentence and identifying each word.
Input: raw text
Output: tokens
The lexer doesn't care about meaning. It just knows let is a keyword, x is a name, 1 is a number, and + is an operator. Whitespace and comments are discarded.
Step 2: Parser
Read tokens, produce a tree
The parser reads the token stream and builds a tree that represents the structure of the program. This is called an Abstract Syntax Tree (AST).
The critical job: operator precedence. The parser knows that 2 * 3 must be computed before 1 + .... The tree's shape encodes this.
Output: Abstract Syntax Tree
Notice the tree shape. Multiply is nested inside Add, deeper in the tree. When we evaluate bottom-up, 2 * 3 = 6 happens first, then 1 + 6 = 7. The tree structure IS the precedence.
Step 3: Type Checker
Walk the tree, verify types make sense
The type checker walks the AST and verifies that every operation is valid. It annotates each node with its type.
2 * 3 — int * int = int. Valid.
1 + 6 — int + int = int. Valid.
let x = ... — x has no annotation, inferred as int. Valid.
Now imagine a mistake:
let x int = "hello" Error: cannot assign string to int variable
The type checker catches this before the program runs. Better to find the bug now than in production.
Step 4: Code Generator
Walk the tree, write C
The code generator walks the type-checked AST and outputs C source code. Each node type has a translation rule.
Monk
let x = 1 + 2 * 3 Generated C
int64_t x = 1 + 2 * 3; A more complex example — a function:
Monk
const add = (a int, b int) int {
return a + b
}
show(add(3, 4)) Generated C
int64_t add(int64_t a, int64_t b) {
return a + b;
}
monk_show(add(3, 4)); The generated C is compiled by your system's C compiler (GCC, Clang) into a native binary. Done.
The complete picture
What Monk does NOT do
No bytecode or virtual machine.
Source goes straight to C. No intermediate bytecode format. No VM to maintain.
No runtime interpreter.
The output is a standalone native binary. No Monk runtime needed to execute it.
No LLVM.
LLVM is a 100+ MB dependency. We generate C instead and let your system's C compiler do the heavy lifting. LLVM can be added later as an optional optimization backend.
Why no REPL? Monk is purely a compiler. There's no interactive mode — you write a .monk file, compile it, run the binary. This keeps the toolchain simple: one path, one output format, no interpreter to maintain.
The files you'll write
The Monk compiler is built in Go. Here's what each package does:
Key takeaways
A compiler is four steps: lex (words), parse (grammar), check (meaning), generate (translate).
Each step transforms a simpler representation into a richer one. Characters to tokens to trees to typed trees to C code.
The AST's tree shape encodes operator precedence. Deeper nodes are evaluated first.
Monk compiles to C, then C compiles to native. No VM, no GC, no runtime.