Lesson 4.3

Function Hoisting

Why functions are lifted above main(), and how the two-pass approach makes it work.

monk c 15 min read

Two languages, two rules about order.

In Monk, you can define a function anywhere. It's just a value assigned to a variable. You can put it inside a block, after the code that calls it, wherever you like:

let result = add(3, 4)
show(result)

let add = (a int, b int) int {
    return a + b
}

C does not allow this. In C, a function must be declared before it's called. And C does not allow function definitions inside other functions -- every function must be at the top level of the file.

This creates a mismatch. Monk says "functions are values, put them anywhere." C says "functions are top-level declarations, define them before use." The code generator needs to bridge this gap.

The solution: hoist everything.

The code generator takes every function expression in the Monk program and lifts it out of wherever it appears. Each one becomes a static C function placed above main(). The original variable assignment is replaced with a direct call to that C function.

Here's what the previous example becomes:

#include "runtime.h"

// Hoisted: "add" becomes _fn_add
static MonkValue _fn_add(MonkValue a, MonkValue b) {
    MonkValue _ret = monk_add(a, b);
    return _ret;
}

int main(int argc, char *argv[]) {
    monk_init(argc, argv);

    MonkValue result = _fn_add(monk_int(3), monk_int(4));
    monk_show(result);

    monk_free(result);
    return 0;
}

The function moved to the top. The call site now references _fn_add directly. No function pointers, no indirection. Just a normal C function call.

Two passes, not one.

The code generator walks the AST twice. Each pass has a specific job.

First pass: collect functions.

Scan the entire AST for function expression nodes. For each one, assign a unique C name (like _fn_add or _fn_1) and record the mapping: Monk variable name to C function name. Then emit the C function definition.

Second pass: emit main body.

Walk the AST again and emit all non-function statements into main(). When a function call is encountered, look up the C name from the map and emit a direct call.

Two passes, not one. The first pass finds functions. The second pass emits code. This is why the code generator needs the full AST, not a stream of tokens.

How functions get their C names.

When a function expression is assigned to a named variable, the code generator can use that name: add becomes _fn_add. The _fn_ prefix avoids collisions with C standard library functions and runtime functions.

For anonymous functions or when names collide, a counter provides uniqueness: _fn_1, _fn_2, etc. The counter increments globally, so every function in the program gets a distinct C name regardless of scope.

let add = (a int, b int) int { return a + b }
let mul = (a int, b int) int { return a * b }
let apply = (f, x int, y int) int { return f(x, y) }

These might become:

static MonkValue _fn_add(MonkValue a, MonkValue b) { ... }
static MonkValue _fn_mul(MonkValue a, MonkValue b) { ... }
static MonkValue _fn_apply(MonkValue f, MonkValue x, MonkValue y) { ... }

The static keyword limits the function's visibility to the current file. This prevents name collisions with other C files that might be linked in.

A complete example with multiple functions.

let square = (n int) int {
    return n * n
}

let sum_of_squares = (a int, b int) int {
    return square(a) + square(b)
}

let result = sum_of_squares(3, 4)
show(result)

The code generator produces:

#include "runtime.h"

static MonkValue _fn_square(MonkValue n) {
    MonkValue _ret = monk_mul(n, n);
    return _ret;
}

static MonkValue _fn_sum_of_squares(MonkValue a, MonkValue b) {
    MonkValue _tmp_1 = _fn_square(a);
    MonkValue _tmp_2 = _fn_square(b);
    MonkValue _ret = monk_add(_tmp_1, _tmp_2);
    monk_free(_tmp_1);
    monk_free(_tmp_2);
    return _ret;
}

int main(int argc, char *argv[]) {
    monk_init(argc, argv);

    MonkValue result = _fn_sum_of_squares(monk_int(3), monk_int(4));
    monk_show(result);

    monk_free(result);
    return 0;
}

Both functions are hoisted. _fn_sum_of_squares calls _fn_square directly -- no lookup, no indirection. The hoisting ensures _fn_square is defined before _fn_sum_of_squares uses it.

The trade-off: what about closures?

Hoisting functions as static C functions is simple and fast. But it has a limitation: closures.

A closure is a function that captures variables from its enclosing scope. When a function is hoisted out of its original scope and placed at the top level, those captured variables are no longer accessible.

let make_adder = (n int) {
    return (x int) int {
        return x + n  // captures "n" from outer scope
    }
}

In the current design, closures capture by copy at the time the function is created. The inner function gets its own copy of n. This aligns with Monk's value semantics -- no shared mutable state.

Full mutable closure support (where the inner function could mutate the outer variable and the outer scope would see the change) is deferred. For now, functions are compiled as real C functions, not as function pointers with captured environments. This is simpler, faster, and sufficient for most programs.

Key takeaways

Monk allows functions anywhere. C requires them at the top level, declared before use. Hoisting bridges this gap.

Two passes: first collects and emits functions above main(), second emits the program body inside main().

Each function gets a unique C name (_fn_add, _fn_1) and is marked static for file-level isolation.

Functions are compiled as real C functions, not function pointers. Simple and fast, but full mutable closures are deferred.