Lesson 6.3

Building the Checker

Walking the AST a second time. Scope, inference, all-paths return, and the bugs we found while building it.

go 15 min read

A second walk over the same tree.

The parser builds an AST. The code generator walks it and emits C. The checker is a third walker -- it traverses the same tree looking for semantic errors instead of generating output.

The checker lives in src/types/. Its public interface is a single function:

func Check(program *syntax.Program) (*Info, []Error)

It takes the parsed AST and returns two things: an Info struct containing the type information it discovered (used by codegen for unboxing), and a slice of errors. If the slice is non-empty, compilation stops.

Scope: a stack of symbol tables.

The checker tracks what names are in scope using a linked list of symbol tables -- each scope is a map from name to type, with a pointer to its parent scope. This is the standard approach for block-scoped languages.

type Scope struct {
    vars   map[string]Type
    parent *Scope
}

When entering a new block (function body, if branch, for-in body), the checker pushes a new child scope. When leaving, it pops back to the parent. Looking up a name walks up the chain until found or the root is reached.

Monk allows shadowing -- a let i inside a for-in body can shadow the loop variable -- but not mutation of the loop variable itself. The checker enforces this by marking the loop variable as const in the loop scope, and giving the body its own child scope so any new let i inside it shadows rather than conflicts.

A bug we found: the initial implementation gave the loop body the same scope as the loop variable. A let i = ... inside the body would overwrite the const loop binding, bypassing the const check. Giving the body its own child scope fixed it -- Go-style shadowing allowed, direct mutation still rejected.

Hoisting: two passes over declarations.

Before the main checking pass, the checker does a quick scan over the top-level statements and pre-registers all function declarations. This is called hoisting -- it's what makes mutual recursion work.

Without hoisting, a function that calls another function defined later in the file would produce an "undefined variable" error. With hoisting, all function signatures are known before any function body is checked. The checker knows the type of every function at every call site.

// Both functions use each other.
// Hoisting pre-registers both signatures before either body is checked.

let ping = (n int) string {
    if n == 0 { return "done" }
    return pong(n - 1)    // pong not defined yet -- but hoisting knows its signature
}

let pong = (n int) string {
    return ping(n - 1)
}

The checks: what the checker verifies.

The checker walks each statement and expression, threading a scope and collecting errors. Here's what it enforces at each construct:

Variable declarations

let x int = 42        // annotation present: check RHS assignable to int
let y = "hello"       // no annotation: infer type from RHS (string)
y = 99                // error: cannot assign int to string variable 'y'

If an annotation is present, the initializer must be assignable to it. If absent, the initializer's type becomes the variable's type from that point forward.

Array element enforcement

let nums int[] = [1, 2, 3]    // ok
let bad  int[] = [1, "two"]   // error: string not assignable to int[]

nums[0] = 99     // ok
nums[0] = "hi"   // error: string not assignable to element of int[]

Record shape

type User { name string, age int }

let u User = { name: "Alice", age: 30 }    // ok
let v User = { name: "Bob" }               // error: missing field 'age'
let w User = { name: "Carol", age: 30, x: true }  // error: extra field 'x'

Function calls

let add = (a int, b int) int { return a + b }

add(1, 2)      // ok
add(1)         // error: wrong number of arguments (want 2, got 1)
add(1, "two")  // error: argument 2 has type string, want int

Compound assignment

Compound operators (+=, -=, *=, etc.) are checked as their implied binary operation. This caught a real bug: let s = "hi"; s -= "world" was passing the checker because only the target type was checked, not the operation. Adding checkCompoundOp that validates the implied binary on the actual types fixed it.

All-paths return analysis.

A non-none function must return a value on every possible execution path. The checker walks the function body and determines whether a return is guaranteed. The analysis is conservative: it knows about if/else and throw, but conservatively rejects loops as non-guaranteeing (a loop might execute zero times).

let classify = (n int) string {
    if n > 0 {
        return "positive"
    } else if n < 0 {
        return "negative"
    }
    // error: function 'classify' may not return on all paths
    // (the else branch -- n == 0 -- falls through)
}

let classify = (n int) string {
    if n > 0 {
        return "positive"
    } else if n < 0 {
        return "negative"
    } else {
        return "zero"         // all paths covered now
    }
}

A throw also counts as a terminating path. If every branch either returns or throws, the function passes the check.

Why not loops? Proving that a loop always executes at least once (so any return inside it is guaranteed) requires knowing the loop bounds. That's a harder problem. The checker takes the conservative approach: if your only return is inside a loop, you need an unconditional return after the loop too.

The Info struct: type data for codegen.

The checker doesn't just reject bad programs -- it also produces information that the code generator uses. types.Check returns an *Info struct containing:

Per-expression types. Every expression in the AST has a type. Codegen uses this to decide whether to emit a raw C scalar or a MonkValue.

Per-declaration types. Each variable declaration is annotated with the type the checker resolved for it. Used when emitting C variable declarations.

Per-function signatures. The resolved parameter types and return type for each function. Used to decide whether a function gets an unboxed C signature.

This is what connects the checker to codegen. Without this information, the code generator would have to repeat all the type inference logic itself. Instead, it just asks: "what type did the checker determine for this expression?" and acts accordingly.

Key takeaways

The checker is a second AST walk -- same tree as codegen, looking for errors instead of generating output.

Scope is a linked list of symbol tables. Shadowing is allowed; mutating a const loop variable is not.

Hoisting pre-registers all function signatures before checking bodies. Mutual recursion just works.

The checker returns an Info struct alongside errors. Codegen uses it for scalar unboxing -- no re-inference needed.