Lesson 2.3

Designing Monk's AST

Every node type. Expressions vs statements. The full tree design.

monk 15 min read

The fundamental split: expressions vs statements.

Every AST node is either an expression or a statement. The difference is simple: expressions produce a value, statements don't.

Expressions (produce a value)

42 — produces the number 42

a + b — produces their sum

greet("world") — produces the return value

(x int) int { ... } — produces a function

Statements (perform an action)

let x = 42 — declares a variable

x = 10 — assigns a value

if x > 0 { ... } — controls flow

return x — exits a function

This split matters because it determines where things can appear. An expression can be used anywhere a value is expected — as an argument, in a condition, on the right side of =. A statement cannot be used as a value.

The key decision: assignment is a statement.

In C and JavaScript, assignment is an expression. x = 5 both assigns and produces the value 5. This enables patterns like while (line = readLine()), but it also enables one of the most common bugs in programming:

// C / JavaScript: compiles fine, always true
if (x = 5) {
    // this was supposed to be x == 5
    // but it assigns 5 to x and evaluates as true
}

Monk makes assignment a statement. You cannot write if x = 5 — the parser rejects it because a statement can't appear where an expression is expected. This eliminates the bug entirely at the grammar level.

// Monk: assignment is a statement
x = 5          // fine, standalone statement
if x == 5 {    // fine, comparison expression
    // ...
}
if x = 5 {     // PARSE ERROR: expected expression, got assignment
    // ...
}

Trade-off acknowledged. You can't write while (ch = read()) in Monk. You need a separate assignment before the condition. Monk chose safety over brevity. Python, Go, and Rust made the same choice.

Expression nodes: 15 types.

Every expression in Monk maps to one of these AST node types. Here's a real Monk program that touches all 15 — the table below maps each highlighted construct to its node type.

// Literals: NumberExpr, StringExpr, BoolExpr, NoneExpr
let count int = 42              // NumberExpr(42)
let pi = 3.14                   // NumberExpr(3.14)
let name = "Monk"               // StringExpr
let greeting = `Hello ${name}!`  // StringExpr (template literal)
let done = false                // BoolExpr
let nothing = none              // NoneExpr

// Variable reference: IdentExpr
let doubled = count             // IdentExpr("count")

// Prefix operators: UnaryExpr
let neg = -count                // UnaryExpr(-, IdentExpr)
let inv = !done                 // UnaryExpr(!, IdentExpr)

// Arithmetic and comparison: BinaryExpr
let sum = count + 8             // BinaryExpr(+, IdentExpr, NumberExpr)
let big = count > 10            // BinaryExpr(>, IdentExpr, NumberExpr)

// Grouping: GroupExpr
let result = (count + 8) * 2   // GroupExpr wraps the BinaryExpr

// Collection literals: ArrayExpr, RecordExpr
let scores = [10, 20, 30]      // ArrayExpr([NumberExpr, NumberExpr, NumberExpr])
let point = {x: 10, y: 20}    // RecordExpr([field: NumberExpr, ...])

// Access: IndexExpr, PropertyExpr
let first = scores[0]          // IndexExpr(IdentExpr, NumberExpr)
let px = point.x               // PropertyExpr(IdentExpr, "x")

// Functions and calls: FunctionExpr, CallExpr
const add = (a int, b int) int { return a + b }  // FunctionExpr
let total = add(count, 8)                         // CallExpr(IdentExpr, [IdentExpr, NumberExpr])

// Error: ThrowExpr
const safe_div = (a int, b int) int {
    if b == 0 { return throw "division by zero" } // ThrowExpr("division by zero")
    return a / b
}

Every expression in Monk maps to one of these AST node types:

Literals (leaf nodes, no children)

NumberExpr 42, 3.14, 0xFF Integer or float value

StringExpr "hello", `world` String value

BoolExpr true, false Boolean value

NoneExpr none The absence of a value

IdentExpr x, name, counter A variable reference

Operations (inner nodes, have children)

UnaryExpr -x, !done, ~mask Prefix operator + one operand

BinaryExpr a + b, x == y Left operand + operator + right operand

CallExpr add(1, 2) Function expression + argument list

IndexExpr arr[0] Collection + index expression

PropertyExpr record.field Object + field name

Compound expressions

ArrayExpr [1, 2, 3] List of element expressions

RecordExpr {name: "Monk", v: 2} List of key-value pairs

FunctionExpr (a int) int { ... } Params + return type + body

ThrowExpr throw "error" Error value expression

GroupExpr (a + b) Parenthesized sub-expression

Functions are values, not declarations.

In many languages, function declarations are special syntax at the statement level. Monk takes a different approach: a function literal is an expression, just like a number or a string. You assign it to a variable with let or const.

// The function literal (a int, b int) int { return a + b }
// is an expression, just like 42 or "hello"
const add = (a int, b int) int {
    return a + b
}

// You can pass them directly as arguments
let nums = [3, 1, 2]
sort(nums, (a int, b int) int { return a - b })

In the AST, a FunctionExpr sits in the same position any other expression would. The parser treats (a int, b int) int { return a + b } the same way it treats 42 — it's a value that gets assigned, passed, or returned.

No func or function keyword. There's no function declaration syntax in Monk. If it starts with ( and has parameters with types, it's a function expression. This keeps the grammar simpler — one fewer statement type to parse.

Statement nodes: 14 types.

Declarations

VarDecl let x int = 42 Name + optional type + optional init

TypeDecl type Point { x int, y int } Named record shape

Assignment

AssignStmt x = 10, arr[0] = 5 Target + value expression

Control flow

IfStmt if cond { } else { } Condition + then block + optional else

WhileStmt while cond { } Condition + body

ForStmt for item in list { } Variable + iterable + body

ReturnStmt return x + 1 Optional return value

BreakStmt break Exit innermost loop

ContinueStmt continue Skip to next iteration

Error handling & modules

GuardStmt guard { } against e { } Try block + error handler

ExprStmt doSomething() Expression used as statement

BlockStmt { ... } Sequence of statements

UseStmt use math Import a module

ExportStmt export const pi = 3.14 Make declaration public

Note ExprStmt — the bridge between the two worlds. When you write print("hello") as a standalone line, the parser wraps the call expression in an ExprStmt. The expression produces a value (the return value of print), but it's used as a statement — the value is discarded.

Every node knows where it came from.

Every AST node embeds a position: Pos{Line, Column}. This is the location in the source file where the node begins.

Why does this matter? Because every phase after parsing needs to point back to the source when something goes wrong:

Type checker

"Type mismatch at line 12, column 5" — uses the AST node's position.

Code generator

Emits #line 12 "main.monk" directives in the generated C code, so runtime errors map back to Monk source.

LSP (future)

"Go to definition" needs to know where each identifier was defined.

Lesson learned the hard way. Position tracking was added to Monk's AST after the initial implementation. Without it, the code generator couldn't emit #line directives, making runtime errors nearly impossible to debug. Add positions from the start.

Type annotations: space-separated, not colon-separated.

Most languages with type annotations use a colon: let x: int = 42. Monk uses space separation instead: let x int = 42.

// Monk's type syntax
let count int = 0
let names string[] = ["Alice", "Bob"]
let maybe int? = none
const greet = (name string) string {
    return "Hello, " + name
}

In the AST, the type is stored as a string identifier. Array types get a [] suffix, optional types get a ? suffix. The parser reads the type name as an identifier, then checks for these suffixes. No colon token needed — one fewer token to lex and parse.

A full program, fully parsed.

const double = (n int) int {
    return n * 2
}
let result = double(21)

The complete AST:

Program

VarDecl (const, name: "double")

init: FunctionExpr

params: [("n", type: "int")]

returnType: "int"

body: BlockStmt

ReturnStmt

value: BinaryExpr (*)

left: IdentExpr ("n")

right: NumberExpr (2)

VarDecl (let, name: "result")

init: CallExpr

func: IdentExpr ("double")

args: [NumberExpr (21)]

Everything the compiler needs to generate code is in this tree. Variable names, types, function bodies, call arguments. The source text is no longer needed.

Key takeaways

Expressions produce values, statements don't. Assignment is a statement in Monk — prevents if x = 5 bugs.

29 node types: 15 expressions + 14 statements. Each is a struct with typed fields, not a generic "node" with string properties.

Functions are expressions assigned to variables. No function declaration syntax, no func keyword.

Every node embeds Pos{Line, Column}. Add position tracking from day one — you'll need it for error messages and source mapping.