Lesson 4.1

What Is Code Generation?

The final compiler step. Walk the tree, write C. Every AST node has a translation rule.

concepts c 10 min read

Translation, not transformation.

Think about translating a book from English to Spanish. You don't change the story. You don't rearrange the chapters. You express the same ideas in a different language, following that language's grammar and conventions.

Code generation is exactly this. The AST is your outline -- the structure of the program, fully understood. The code generator walks that outline node by node and writes the equivalent C code. Same program, different language.

By the time we reach code generation, the hard work is done. The lexer found the tokens. The parser built the tree. The checker (when it exists) verified types. Now we just need to express the same program in C, and let a C compiler handle the rest.

Why C and not something else.

Monk compiles to C, not LLVM IR, not assembly, not bytecode. This is a deliberate choice with real trade-offs.

1

Zero dependencies.

Every machine has a C compiler. No LLVM installation, no JVM, no runtime download. Just cc.

2

Inspectable output.

You can open the generated .c file and read it. Try reading LLVM IR or JVM bytecode -- it's possible, but not pleasant.

3

Free optimizations.

GCC and Clang have decades of optimization passes. Monk gets them all for free by targeting C. No need to write our own optimizer.

4

Universal portability.

C runs on everything -- from microcontrollers to mainframes. Monk inherits all of that for free.

You can read the generated C. monk build -o hello.c hello.monk writes the C source. Open it. It's not obfuscated -- it's just C.

A simple program, two languages.

Here's a Monk program:

let greeting = "hello world"
show(greeting)

And here's the C code the generator produces:

#include "runtime.h"

int main(int argc, char *argv[]) {
    monk_init(argc, argv);

    #line 1 "hello.monk"
    MonkValue greeting = monk_string("hello world");
    #line 2 "hello.monk"
    monk_show(greeting);

    monk_free(greeting);
    return 0;
}

Every line of Monk maps to one or more lines of C. The structure is preserved. The meaning is preserved. The only new things are C-specific bookkeeping: including the runtime header, initializing the runtime, freeing memory at the end, and #line directives for source mapping.

The anatomy of a generated C file.

Every generated C file follows the same structure, regardless of how complex the Monk program is:

#include "runtime.h"

// 1. Hoisted functions (from Monk function expressions)
static MonkValue _fn_add(MonkValue a, MonkValue b) {
    // function body
}

// 2. Main function (everything else)
int main(int argc, char *argv[]) {
    monk_init(argc, argv);

    // Program body: declarations, expressions, control flow
    MonkValue x = monk_int(42);
    MonkValue result = _fn_add(x, monk_int(8));
    monk_show(result);

    // Cleanup
    monk_free(x);
    monk_free(result);
    return 0;
}
1

Runtime include. The runtime header provides MonkValue, all the monk_* functions, and memory management.

2

Hoisted functions. All function expressions are lifted out and placed above main() as static C functions. More on this in lesson 4.3.

3

Program body. The rest of the program -- variable declarations, function calls, control flow -- goes inside main().

4

Cleanup. Every MonkValue allocated in main() gets freed before returning.

Why everything goes through the runtime.

You might wonder: if x is an integer, why not just emit int x = 42; in C? Why wrap everything in MonkValue?

Because Monk values aren't raw C types. A MonkValue is a tagged union -- it knows its own type at runtime. When you write x + y in Monk, the runtime checks that both values are numbers before adding them. If one is a string, it produces a proper error instead of silently corrupting memory.

This is why x + y becomes monk_add(x, y), not x + y in the generated C. Every operation goes through a runtime function that enforces type safety and proper value semantics. The C code is the vehicle, but the runtime is the engine.

Key takeaways

1

Code generation walks the AST and writes equivalent C. Same program, different language.

2

Monk targets C for zero dependencies, readable output, and free optimizations from mature C compilers.

3

Generated C files have a fixed structure: runtime include, hoisted functions, main body, cleanup.

4

All operations go through runtime functions. No direct C arithmetic -- the runtime enforces type safety.