Lesson 4.2

Translation Rules

How each Monk construct becomes C. The systematic mapping from AST nodes to C code.

monk c 18 min read

One node, one rule.

The code generator is a big switch statement. It looks at an AST node, checks its type, and applies exactly one translation rule. A number literal always becomes monk_int(). An if statement always becomes if (monk_is_truthy(...)). No ambiguity, no context-dependence.

This is what makes code generation the most mechanical phase of the compiler. The creativity happened in the parser (tree shape) and the runtime (value semantics). Here, we just follow the rules.

Literals: creating values from nothing.

Every literal in Monk maps to a runtime constructor. The constructor allocates a MonkValue with the right type tag and data.

Monk C
42 monk_int(42)
3.14 monk_float(3.14)
"hello" monk_string("hello")
true monk_bool(1)
false monk_bool(0)
none monk_none()

Straightforward. Each Monk type has exactly one C constructor.

Variables: declaring and copying.

In Monk, both let and const create variables. The difference is enforced by the checker (const cannot be reassigned), not by codegen. Both produce the same C.

let x int = 42
const greeting = "hello"

Becomes:

MonkValue x = monk_int(42);
MonkValue greeting = monk_string("hello");

When assigning one variable to another, Monk uses value semantics -- the new variable gets its own copy. This is critical: without the deep copy, two variables would share the same memory, and freeing one would corrupt the other.

let y = x

Becomes:

MonkValue y = monk_deep_copy(x);

Assignment: the order matters.

Reassigning a variable seems simple: free the old value, assign the new one. But the order is a trap.

x = x + 1

The wrong way to translate this:

// WRONG: use-after-free bug
monk_free(x);
x = monk_add(x, monk_int(1));  // x is already freed!

The right way:

// CORRECT: compute first, then free
MonkValue _new = monk_add(x, monk_int(1));
monk_free(x);
x = _new;

This was an actual bug in Monk's code generator. The pattern is: compute the new value while the old value is still alive, then free the old value, then assign. Never free before computing. This applies to every reassignment, not just arithmetic.

Arithmetic and comparison: no C operators.

This is the most important rule to internalize. Monk's + does NOT become C's +. Every operation goes through a runtime function.

Monk C
x + y monk_add(x, y)
x - y monk_sub(x, y)
x * y monk_mul(x, y)
x / y monk_div(x, y)
x == y monk_equal(x, y)
x != y monk_not_equal(x, y)
x < y monk_less(x, y)
x >= y monk_greater_equal(x, y)

No C arithmetic operators anywhere. x + y does not become x + y in C -- it becomes monk_add(x, y). Every operation goes through the runtime because x and y are MonkValues (tagged unions), not raw C ints.

Control flow: truthiness is the bridge.

Monk's control flow maps cleanly to C's, with one twist: every condition passes through monk_is_truthy(). This function checks if a MonkValue is truthy (everything except false, none, and 0).

if x > 10 {
    show("big")
}

Becomes:

MonkValue _tmp_1 = monk_greater(x, monk_int(10));
if (monk_is_truthy(_tmp_1)) {
    monk_show(monk_string("big"));
}
monk_free(_tmp_1);

While loops follow the same pattern:

while count > 0 {
    count = count - 1
}

Becomes:

while (1) {
    MonkValue _tmp_1 = monk_greater(count, monk_int(0));
    if (!monk_is_truthy(_tmp_1)) { monk_free(_tmp_1); break; }
    monk_free(_tmp_1);
    MonkValue _new = monk_sub(count, monk_int(1));
    monk_free(count);
    count = _new;
}

The while loop re-evaluates the condition each iteration. The condition itself might allocate a temp value that needs to be freed -- so the loop body handles that explicitly.

For-in: iterating over arrays.

for item in items {
    show(item)
}

Becomes a C for-loop that walks the array's elements:

for (int _i = 0; _i < monk_array_len(items); _i++) {
    MonkValue item = monk_deep_copy(monk_array_get(items, _i));
    monk_show(item);
    monk_free(item);
}

Each element is deep-copied into the loop variable. Value semantics -- mutating item inside the loop does not affect the original array.

Arrays and records: compound values.

Arrays and records are built with variadic constructors that take the element count followed by the elements themselves.

let numbers = [1, 2, 3]
let point = {x: 10, y: 20}

Becomes:

MonkValue numbers = monk_array(3, monk_int(1), monk_int(2), monk_int(3));
MonkValue point = monk_record(2, "x", monk_int(10), "y", monk_int(20));

The first argument is always the count. For records, field names alternate with their values. The runtime uses this to build the internal data structures.

Error handling: guard and against.

Monk's guard/against pattern maps to C's setjmp/longjmp. This is the non-local jump mechanism that C provides for error handling -- similar in spirit to try/catch, but lower level.

guard result = risky_call() against err {
    show(err)
}

The code generator emits a setjmp call that saves the current execution state. If the guarded expression throws (via longjmp), execution jumps back to the saved state and runs the against block instead.

The details of the setjmp/longjmp pattern are covered in the runtime phase (lesson 3.5). From codegen's perspective, the rule is: emit the jump setup, emit the guarded expression, emit the fallback block.

Function calls: built-ins and user functions.

Built-in functions like show, len, and push map directly to C runtime functions:

Monk C
show(x) monk_show(x)
len(arr) monk_len(arr)
push(arr, val) monk_push(arr, val)

User-defined functions are called by their hoisted C name. How that name is determined is the subject of the next lesson.

Key takeaways

1

Each AST node type has exactly one translation rule. The code generator is a deterministic switch.

2

Assignment must compute the new value before freeing the old one. Reversing this order causes use-after-free bugs.

3

No C arithmetic operators. Every operation goes through runtime functions that work on MonkValues.

4

Control flow conditions pass through monk_is_truthy(). Arrays iterate with deep-copied loop variables.