Lesson 2.3

Designing Monk's AST

Every node type. Expressions vs statements. The full tree design.

monk 15 min read

The fundamental split: expressions vs statements.

Every AST node is either an expression or a statement. The difference is simple: expressions produce a value, statements don't.

Expressions (produce a value)

42 — produces the number 42

a + b — produces their sum

greet("world") — produces the return value

(x int) int { ... } — produces a function

Statements (perform an action)

let x = 42 — declares a variable

x = 10 — assigns a value

if x > 0 { ... } — controls flow

return x — exits a function

This split matters because it determines where things can appear. An expression can be used anywhere a value is expected — as an argument, in a condition, on the right side of =. A statement cannot be used as a value.

The key decision: assignment is a statement.

In C and JavaScript, assignment is an expression. x = 5 both assigns and produces the value 5. This enables patterns like while (line = readLine()), but it also enables one of the most common bugs in programming:

// C / JavaScript: compiles fine, always true
if (x = 5) {
    // this was supposed to be x == 5
    // but it assigns 5 to x and evaluates as true
}

Monk makes assignment a statement. You cannot write if x = 5 — the parser rejects it because a statement can't appear where an expression is expected. This eliminates the bug entirely at the grammar level.

// Monk: assignment is a statement
x = 5          // fine, standalone statement
if x == 5 {    // fine, comparison expression
    // ...
}
if x = 5 {     // PARSE ERROR: expected expression, got assignment
    // ...
}

Trade-off acknowledged. You can't write while (ch = read()) in Monk. You need a separate assignment before the condition. Monk chose safety over brevity. Python, Go, and Rust made the same choice.

Expression nodes: 15 types.

Every expression in Monk maps to one of these AST node types. Here's a real Monk program that touches all 15 — the table below maps each highlighted construct to its node type.

// Literals: NumberExpr, StringExpr, BoolExpr, NoneExpr
let count int = 42              // NumberExpr(42)
let pi = 3.14                   // NumberExpr(3.14)
let name = "Monk"               // StringExpr
let greeting = `Hello ${name}!`  // StringExpr (template literal)
let done = false                // BoolExpr
let nothing = none              // NoneExpr

// Variable reference: IdentExpr
let doubled = count             // IdentExpr("count")

// Prefix operators: UnaryExpr
let neg = -count                // UnaryExpr(-, IdentExpr)
let inv = !done                 // UnaryExpr(!, IdentExpr)

// Arithmetic and comparison: BinaryExpr
let sum = count + 8             // BinaryExpr(+, IdentExpr, NumberExpr)
let big = count > 10            // BinaryExpr(>, IdentExpr, NumberExpr)

// Grouping: GroupExpr
let result = (count + 8) * 2   // GroupExpr wraps the BinaryExpr

// Collection literals: ArrayExpr, RecordExpr
let scores = [10, 20, 30]      // ArrayExpr([NumberExpr, NumberExpr, NumberExpr])
let point = {x: 10, y: 20}    // RecordExpr([field: NumberExpr, ...])

// Access: IndexExpr, PropertyExpr
let first = scores[0]          // IndexExpr(IdentExpr, NumberExpr)
let px = point.x               // PropertyExpr(IdentExpr, "x")

// Functions and calls: FunctionExpr, CallExpr
const add = (a int, b int) int { return a + b }  // FunctionExpr
let total = add(count, 8)                         // CallExpr(IdentExpr, [IdentExpr, NumberExpr])

// Error: ThrowExpr
const safe_div = (a int, b int) int {
    if b == 0 { return throw "division by zero" } // ThrowExpr("division by zero")
    return a / b
}

Every expression in Monk maps to one of these AST node types:

Literals (leaf nodes, no children)
NumberExpr 42, 3.14, 0xFF Integer or float value
StringExpr "hello", `world` String value
BoolExpr true, false Boolean value
NoneExpr none The absence of a value
IdentExpr x, name, counter A variable reference
Operations (inner nodes, have children)
UnaryExpr -x, !done, ~mask Prefix operator + one operand
BinaryExpr a + b, x == y Left operand + operator + right operand
CallExpr add(1, 2) Function expression + argument list
IndexExpr arr[0] Collection + index expression
PropertyExpr record.field Object + field name
Compound expressions
ArrayExpr [1, 2, 3] List of element expressions
RecordExpr {name: "Monk", v: 2} List of key-value pairs
FunctionExpr (a int) int { ... } Params + return type + body
ThrowExpr throw "error" Error value expression
GroupExpr (a + b) Parenthesized sub-expression

Functions are values, not declarations.

In many languages, function declarations are special syntax at the statement level. Monk takes a different approach: a function literal is an expression, just like a number or a string. You assign it to a variable with let or const.

// The function literal (a int, b int) int { return a + b }
// is an expression, just like 42 or "hello"
const add = (a int, b int) int {
    return a + b
}

// You can pass them directly as arguments
let nums = [3, 1, 2]
sort(nums, (a int, b int) int { return a - b })

In the AST, a FunctionExpr sits in the same position any other expression would. The parser treats (a int, b int) int { return a + b } the same way it treats 42 — it's a value that gets assigned, passed, or returned.

No func or function keyword. There's no function declaration syntax in Monk. If it starts with ( and has parameters with types, it's a function expression. This keeps the grammar simpler — one fewer statement type to parse.

Statement nodes: 14 types.

Declarations
VarDecl let x int = 42 Name + optional type + optional init
TypeDecl type Point { x int, y int } Named record shape
Assignment
AssignStmt x = 10, arr[0] = 5 Target + value expression
Control flow
IfStmt if cond { } else { } Condition + then block + optional else
WhileStmt while cond { } Condition + body
ForStmt for item in list { } Variable + iterable + body
ReturnStmt return x + 1 Optional return value
BreakStmt break Exit innermost loop
ContinueStmt continue Skip to next iteration
Error handling & modules
GuardStmt guard { } against e { } Try block + error handler
ExprStmt doSomething() Expression used as statement
BlockStmt { ... } Sequence of statements
UseStmt use math Import a module
ExportStmt export const pi = 3.14 Make declaration public

Note ExprStmt — the bridge between the two worlds. When you write print("hello") as a standalone line, the parser wraps the call expression in an ExprStmt. The expression produces a value (the return value of print), but it's used as a statement — the value is discarded.

Every node knows where it came from.

Every AST node embeds a position: Pos{Line, Column}. This is the location in the source file where the node begins.

Why does this matter? Because every phase after parsing needs to point back to the source when something goes wrong:

1

Type checker

"Type mismatch at line 12, column 5" — uses the AST node's position.

2

Code generator

Emits #line 12 "main.monk" directives in the generated C code, so runtime errors map back to Monk source.

3

LSP (future)

"Go to definition" needs to know where each identifier was defined.

Lesson learned the hard way. Position tracking was added to Monk's AST after the initial implementation. Without it, the code generator couldn't emit #line directives, making runtime errors nearly impossible to debug. Add positions from the start.

Type annotations: space-separated, not colon-separated.

Most languages with type annotations use a colon: let x: int = 42. Monk uses space separation instead: let x int = 42.

// Monk's type syntax
let count int = 0
let names string[] = ["Alice", "Bob"]
let maybe int? = none
const greet = (name string) string {
    return "Hello, " + name
}

In the AST, the type is stored as a string identifier. Array types get a [] suffix, optional types get a ? suffix. The parser reads the type name as an identifier, then checks for these suffixes. No colon token needed — one fewer token to lex and parse.

A full program, fully parsed.

const double = (n int) int {
    return n * 2
}
let result = double(21)

The complete AST:

Program

VarDecl (const, name: "double")

init: FunctionExpr

params: [("n", type: "int")]

returnType: "int"

body: BlockStmt

ReturnStmt

value: BinaryExpr (*)

left: IdentExpr ("n")

right: NumberExpr (2)

VarDecl (let, name: "result")

init: CallExpr

func: IdentExpr ("double")

args: [NumberExpr (21)]

Everything the compiler needs to generate code is in this tree. Variable names, types, function bodies, call arguments. The source text is no longer needed.

Key takeaways

1

Expressions produce values, statements don't. Assignment is a statement in Monk — prevents if x = 5 bugs.

2

29 node types: 15 expressions + 14 statements. Each is a struct with typed fields, not a generic "node" with string properties.

3

Functions are expressions assigned to variables. No function declaration syntax, no func keyword.

4

Every node embeds Pos{Line, Column}. Add position tracking from day one — you'll need it for error messages and source mapping.