Build a programming language from scratch.
You'll go from zero to a working compiler that produces native binaries. Along the way, you'll learn compiler design, C, and how programming languages actually work.
.monk tokens AST typed AST .c file native Unit 0 — The Ground Floor
How compilers work. Project setup. Then we build.
Unit 1 — The Lexer
Break source code into tokens. Characters in, meaningful chunks out.
What Is a Lexer?
Characters flow in, tokens flow out. The word splitter.
Designing Monk's Token Set
Every keyword, operator, delimiter mapped to a token type.
Building the Lexer
Step by step: scanning, keywords, literals, line tracking. TDD.
References & Further Reading
Crafting Interpreters Ch. 4, lexer blog posts.
Unit 2 — The Parser
Transform tokens into a tree. Understand grammar and precedence.
What Is a Parser?
Tokens in, tree out. The sentence diagrammer.
Operator Precedence
Why 1+2*3 is 7, not 9. 13 levels of precedence.
Designing Monk's AST
29 node types. Expressions vs statements. Assignment is a statement.
Building the Parser
Recursive descent, one grammar rule at a time. TDD.
References & Further Reading
Crafting Interpreters Ch. 5-8, Pratt parsing.
Unit 3 — The C Runtime
The small C library inside every Monk binary. Values, builtins, error handling.
Why a Runtime Library?
The small C library that lives inside every Monk binary.
Values and Types in Memory
MonkValue: the tagged union. Stack vs heap. Type tags.
Value Semantics
Deep copy on assign. No GC. Your data is yours.
Built-in Functions
40+ functions: math, strings, arrays, file I/O, type checking.
Error Handling
guard/against/throw via setjmp/longjmp.
References & Further Reading
Tagged unions, value semantics, setjmp/longjmp resources.
Unit 4 — Code Generation
Walk the AST, write C. Translation rules, function hoisting, source mapping.
What Is Code Generation?
The final step. Walk the tree, write C. Readable output.
Translation Rules
How each Monk construct becomes C. The systematic mapping.
Function Hoisting
Why functions are lifted above main(). The two-pass approach.
Building the Code Generator
Emitter pattern, temp variables, source mapping, real bugs.
References & Further Reading
Compile-to-C strategies, lcc, Chicken Scheme.
Unit 5 — The CLI
monk build, monk run, monk check. The user-facing tool that ties it all together.
Unit 6 — The Type System
Static analysis before codegen. Type inference, structural records, all-paths return, and scalar unboxing to C parity.
Why Add a Type System?
What breaks without one. What Monk's checker catches before runtime.
Monk's Type Model
Nine kinds, optional flag, AssignableTo. The whole model in three parts.
Building the Checker
Scope, hoisting, all-paths return, and the bugs we found building it.
Scalar Unboxing
Eliminating MonkValue overhead. How Monk reached C parity on scalar benchmarks.
References & Further Reading
Type systems, static analysis, V8 hidden classes, TAPL.
Unit 7 — The Module System
Split programs across files. use/export syntax, dependency graphs, cycle detection, topological ordering.
Why a Module System?
What breaks when every program is one file. The updated compilation pipeline.
Monk's Module Syntax
The four import forms. export. How names cross file boundaries.
Building the Resolver
DFS, the Graph struct, cycle detection, export validation. How src/module/ works.
References & Further Reading
Module systems, topological sort, DFS cycle detection.
C FFI
Calling C from Monk. Linking external libraries.
Linter & Formatter
monk fmt, monk lint. Code style and static analysis tooling.
LSP & Editor Support
Language server protocol. VS Code extension. Syntax highlighting.
Distribution
Homebrew, installers, cross-compilation. Shipping the compiler.
What you need before starting
TypeScript, Python, Go, Java. You know variables, functions, loops.
You can run commands, navigate directories, use git.
Not needed upfront. Taught in Phase 3 (runtime) and Phase 4 (codegen).
The stack you'll build
Lexer + Parser + Type Checker
Generates C source code
Built-in functions
Value types, error handling
Zero runtime dependencies
Compiled via system cc