Lesson 5.2

The Compilation Pipeline

What happens between monk build hello.monk and a running binary. Every step, every file.

monk concepts 12 min read

From text to machine code in seven steps.

A compilation pipeline is an assembly line. Raw material enters one end (your source code), passes through a series of stations (lexer, parser, codegen, C compiler), and a finished product rolls off the other end (a native binary). Each station transforms the material into a new shape. No station skips ahead or looks back.

Let's trace a real program through every station. We'll start with the simplest possible Monk file and follow it all the way to an executable.

The program we're tracing.

show("Hello, world!")

One line. One function call. But to produce a working binary from this, the compiler does seven distinct things. Let's walk through each one.

Step 1: Read the source file.

The CLI reads hello.monk from disk into a string. Nothing fancy -- just os.ReadFile. If the file doesn't exist or can't be read, the CLI prints an error and exits with code 1.

Step 2: Lex -- source text to tokens.

The scanner walks through the string character by character and produces a flat list of tokens:

IDENTIFIER "show"
LPAREN    "("
STRING    "Hello, world!"
RPAREN    ")"
EOF

Each token carries its type, its literal value, and its position (line and column). The position data flows all the way through the pipeline -- it's how error messages know where to point.

Step 3: Parse -- tokens to AST.

The parser reads the token stream and builds an abstract syntax tree. For our one-line program, the AST is a single CallExpr node with the function name show and one argument (a string literal).

The parser also checks structure. Missing parentheses, unexpected tokens, incomplete statements -- all caught here. If there are parse errors, the CLI reports them all (not just the first one) and exits. No point generating code from a broken tree.

Step 4: Generate C -- AST to C source.

The code generator walks the AST and emits C code. For our program, the output looks roughly like this:

#include "runtime.h"

int main(void) {
    monk_show(monk_new_string("Hello, world!"));
    return 0;
}

The show() call in Monk becomes monk_show() in C. The string literal becomes a monk_new_string() call that creates a runtime value. The #include "runtime.h" at the top brings in all the type definitions and function declarations the generated code needs.

If -o ends with .c, the pipeline stops here. The C source is written to disk and the CLI exits. This is useful for debugging the compiler itself -- you can read the C output and see exactly what the code generator produced.

Step 5: Write temp files and embed the runtime.

To produce a binary, the generated C needs to be compiled by a real C compiler. But the C compiler needs files on disk -- it can't work from strings in memory. So the CLI:

1

Creates a temp directory in the OS temp space.

2

Writes the generated C source to a temp .c file.

3

Writes runtime.h and runtime.c to the same temp directory.

Where do the runtime files come from? They're embedded in the Go binary itself. Go's embed package lets you bake files into the compiled Go executable at build time. When you install Monk, you get a single binary that carries its own runtime inside it. No separate runtime installation, no "runtime not found" errors.

Step 6: Invoke the C compiler.

The CLI calls the system's C compiler to turn the temp files into a binary:

cc -o hello temp/generated.c temp/runtime.c -I temp/ -lm

By default, Monk uses cc (the system's default C compiler). If you set the CC environment variable, Monk respects it. Want to use Clang? CC=clang monk build hello.monk. Want a cross-compiler? Set CC accordingly.

This is where all the optimization happens. The C compiler has decades of optimization passes -- dead code elimination, register allocation, loop unrolling, vectorization. Monk gets all of it for free by targeting C instead of emitting machine code directly.

Two compilations happen: Monk to C (by the Monk compiler, written in Go) and C to binary (by your system's C compiler). The second one is where optimization happens -- all of cc's decades of work, for free.

Step 7: Clean up and deliver.

After the C compiler succeeds, the CLI deletes the temp directory. The user gets a clean native binary -- no .c files left behind, no .o files, no build artifacts. Just the executable.

$ monk build hello.monk
$ ls
hello      hello.monk
$ ./hello
Hello, world!

What monk run does differently.

monk run executes the exact same pipeline as monk build, with two additions at the end:

1

The binary is built to a temp location (not the current directory).

2

The binary is executed immediately, with its stdout/stderr connected to your terminal.

3

After execution, the temp binary is deleted.

$ monk run hello.monk
Hello, world!
$ ls
hello.monk    # no binary left behind

This is directly inspired by cargo run in Rust and go run in Go. During development, you don't care about the binary -- you just want to see the output. monk run is the fastest path from edit to feedback.

What monk check skips.

monk check runs only steps 1-3: read file, lex, parse. No codegen, no C compilation, no temp files. It validates that your code is syntactically correct and exits.

This makes it fast -- essentially instant for any reasonable file size. No waiting for the C compiler, no disk I/O for temp files. Fast enough that an editor can run it on every keystroke, and a CI pipeline can run it on every commit without adding meaningful build time.

$ monk check hello.monk          # valid -- no output, exit 0
$ monk check broken.monk
broken.monk:3:1: error: expected "end" to close function

The full picture.

Source
--
show("Hello, world!")
Tokens
--
IDENT "show", LPAREN, STRING "Hello, world!", RPAREN, EOF
AST
--
CallExpr(name: "show", args: [StringLit("Hello, world!")])
C Source
--
monk_show(monk_new_string("Hello, world!"))
Binary
--
./hello (native executable)

Five representations of the same program. Each one is a step closer to the machine. The compiler's job is to translate faithfully between each pair -- never losing meaning, never adding meaning.

Key takeaways

1

The pipeline has seven steps: read, lex, parse, codegen, write temps, C compile, clean up.

2

The runtime is embedded in the Go binary and extracted to temp files for compilation. Users install one binary, not a toolchain.

3

Monk uses the system's C compiler (cc or CC env var) for the final compilation. All optimization comes from the C compiler.

4

monk run = build + execute + delete. monk check = lex + parse only. Different stopping points on the same pipeline.