Lesson 2.3
Designing Monk's AST
Every node type. Expressions vs statements. The full tree design.
The fundamental split: expressions vs statements.
Every AST node is either an expression or a statement. The difference is simple: expressions produce a value, statements don't.
Expressions (produce a value)
42 — produces the number 42
a + b — produces their sum
greet("world") — produces the return value
(x int) int { ... } — produces a function
Statements (perform an action)
let x = 42 — declares a variable
x = 10 — assigns a value
if x > 0 { ... } — controls flow
return x — exits a function
This split matters because it determines where things can appear. An expression can be used anywhere a value is expected — as an argument, in a condition, on the right side of =. A statement cannot be used as a value.
The key decision: assignment is a statement.
In C and JavaScript, assignment is an expression. x = 5 both assigns and produces the value 5. This enables patterns like while (line = readLine()), but it also enables one of the most common bugs in programming:
// C / JavaScript: compiles fine, always true
if (x = 5) {
// this was supposed to be x == 5
// but it assigns 5 to x and evaluates as true
}
Monk makes assignment a statement. You cannot write if x = 5 — the parser rejects it because a statement can't appear where an expression is expected. This eliminates the bug entirely at the grammar level.
// Monk: assignment is a statement
x = 5 // fine, standalone statement
if x == 5 { // fine, comparison expression
// ...
}
if x = 5 { // PARSE ERROR: expected expression, got assignment
// ...
} Trade-off acknowledged. You can't write while (ch = read()) in Monk. You need a separate assignment before the condition. Monk chose safety over brevity. Python, Go, and Rust made the same choice.
Expression nodes: 15 types.
Every expression in Monk maps to one of these AST node types. Here's a real Monk program that touches all 15 — the table below maps each highlighted construct to its node type.
// Literals: NumberExpr, StringExpr, BoolExpr, NoneExpr
let count int = 42 // NumberExpr(42)
let pi = 3.14 // NumberExpr(3.14)
let name = "Monk" // StringExpr
let greeting = `Hello ${name}!` // StringExpr (template literal)
let done = false // BoolExpr
let nothing = none // NoneExpr
// Variable reference: IdentExpr
let doubled = count // IdentExpr("count")
// Prefix operators: UnaryExpr
let neg = -count // UnaryExpr(-, IdentExpr)
let inv = !done // UnaryExpr(!, IdentExpr)
// Arithmetic and comparison: BinaryExpr
let sum = count + 8 // BinaryExpr(+, IdentExpr, NumberExpr)
let big = count > 10 // BinaryExpr(>, IdentExpr, NumberExpr)
// Grouping: GroupExpr
let result = (count + 8) * 2 // GroupExpr wraps the BinaryExpr
// Collection literals: ArrayExpr, RecordExpr
let scores = [10, 20, 30] // ArrayExpr([NumberExpr, NumberExpr, NumberExpr])
let point = {x: 10, y: 20} // RecordExpr([field: NumberExpr, ...])
// Access: IndexExpr, PropertyExpr
let first = scores[0] // IndexExpr(IdentExpr, NumberExpr)
let px = point.x // PropertyExpr(IdentExpr, "x")
// Functions and calls: FunctionExpr, CallExpr
const add = (a int, b int) int { return a + b } // FunctionExpr
let total = add(count, 8) // CallExpr(IdentExpr, [IdentExpr, NumberExpr])
// Error: ThrowExpr
const safe_div = (a int, b int) int {
if b == 0 { return throw "division by zero" } // ThrowExpr("division by zero")
return a / b
} Every expression in Monk maps to one of these AST node types:
NumberExpr 42, 3.14, 0xFF Integer or float value StringExpr "hello", `world` String value BoolExpr true, false Boolean value NoneExpr none The absence of a value IdentExpr x, name, counter A variable reference UnaryExpr -x, !done, ~mask Prefix operator + one operand BinaryExpr a + b, x == y Left operand + operator + right operand CallExpr add(1, 2) Function expression + argument list IndexExpr arr[0] Collection + index expression PropertyExpr record.field Object + field name ArrayExpr [1, 2, 3] List of element expressions RecordExpr {name: "Monk", v: 2} List of key-value pairs FunctionExpr (a int) int { ... } Params + return type + body ThrowExpr throw "error" Error value expression GroupExpr (a + b) Parenthesized sub-expression Functions are values, not declarations.
In many languages, function declarations are special syntax at the statement level. Monk takes a different approach: a function literal is an expression, just like a number or a string. You assign it to a variable with let or const.
// The function literal (a int, b int) int { return a + b }
// is an expression, just like 42 or "hello"
const add = (a int, b int) int {
return a + b
}
// You can pass them directly as arguments
let nums = [3, 1, 2]
sort(nums, (a int, b int) int { return a - b })
In the AST, a FunctionExpr sits in the same position any other expression would. The parser treats (a int, b int) int { return a + b } the same way it treats 42 — it's a value that gets assigned, passed, or returned.
No func or function keyword. There's no function declaration syntax in Monk. If it starts with ( and has parameters with types, it's a function expression. This keeps the grammar simpler — one fewer statement type to parse.
Statement nodes: 14 types.
VarDecl let x int = 42 Name + optional type + optional init TypeDecl type Point { x int, y int } Named record shape AssignStmt x = 10, arr[0] = 5 Target + value expression IfStmt if cond { } else { } Condition + then block + optional else WhileStmt while cond { } Condition + body ForStmt for item in list { } Variable + iterable + body ReturnStmt return x + 1 Optional return value BreakStmt break Exit innermost loop ContinueStmt continue Skip to next iteration GuardStmt guard { } against e { } Try block + error handler ExprStmt doSomething() Expression used as statement BlockStmt { ... } Sequence of statements UseStmt use math Import a module ExportStmt export const pi = 3.14 Make declaration public
Note ExprStmt — the bridge between the two worlds. When you write print("hello") as a standalone line, the parser wraps the call expression in an ExprStmt. The expression produces a value (the return value of print), but it's used as a statement — the value is discarded.
Every node knows where it came from.
Every AST node embeds a position: Pos{Line, Column}. This is the location in the source file where the node begins.
Why does this matter? Because every phase after parsing needs to point back to the source when something goes wrong:
Type checker
"Type mismatch at line 12, column 5" — uses the AST node's position.
Code generator
Emits #line 12 "main.monk" directives in the generated C code, so runtime errors map back to Monk source.
LSP (future)
"Go to definition" needs to know where each identifier was defined.
Lesson learned the hard way. Position tracking was added to Monk's AST after the initial implementation. Without it, the code generator couldn't emit #line directives, making runtime errors nearly impossible to debug. Add positions from the start.
Type annotations: space-separated, not colon-separated.
Most languages with type annotations use a colon: let x: int = 42. Monk uses space separation instead: let x int = 42.
// Monk's type syntax
let count int = 0
let names string[] = ["Alice", "Bob"]
let maybe int? = none
const greet = (name string) string {
return "Hello, " + name
}
In the AST, the type is stored as a string identifier. Array types get a [] suffix, optional types get a ? suffix. The parser reads the type name as an identifier, then checks for these suffixes. No colon token needed — one fewer token to lex and parse.
A full program, fully parsed.
const double = (n int) int {
return n * 2
}
let result = double(21) The complete AST:
Program
VarDecl (const, name: "double")
init: FunctionExpr
params: [("n", type: "int")]
returnType: "int"
body: BlockStmt
ReturnStmt
value: BinaryExpr (*)
left: IdentExpr ("n")
right: NumberExpr (2)
VarDecl (let, name: "result")
init: CallExpr
func: IdentExpr ("double")
args: [NumberExpr (21)]
Everything the compiler needs to generate code is in this tree. Variable names, types, function bodies, call arguments. The source text is no longer needed.
Key takeaways
Expressions produce values, statements don't. Assignment is a statement in Monk — prevents if x = 5 bugs.
29 node types: 15 expressions + 14 statements. Each is a struct with typed fields, not a generic "node" with string properties.
Functions are expressions assigned to variables. No function declaration syntax, no func keyword.
Every node embeds Pos{Line, Column}. Add position tracking from day one — you'll need it for error messages and source mapping.