Lesson 2.4

Building the Parser

Recursive descent, one grammar rule at a time. How to turn tokens into trees with TDD.

build 20 min read

One function per grammar rule.

Recursive descent is a direct translation from grammar to code. Each grammar rule becomes a function. The function reads the tokens it expects, calls other rule functions for sub-parts, and returns an AST node.

The build order mirrors how you'd teach someone to read Monk: start with the simplest possible expressions, add operators, then statements, then compound structures. At every step, write a failing test first.

Primary expressions numbers, strings, booleans, identifiers, none

Unary and binary expressions operators with precedence climbing

Call, index, property access postfix operators on expressions

Variable declarations let/const with optional types

Assignment simple and compound (+=, -=, etc.)

Control flow if/else, while, for-in

Functions parameter lists, return types, bodies

Arrays and records collection literals with trailing commas

Error handling guard/against/throw

Modules use, export, type declarations

The parser skeleton.

The parser holds a list of tokens (from the lexer), a cursor position, and the current token. Three helper methods do most of the work:

peek()

Look at the current token without consuming it. Used to decide which parsing function to call.

advance()

Consume the current token and move to the next. Returns the consumed token.

expect(type)

Assert the current token is of the given type. If it matches, advance. If not, report a parse error with the token's position.

The main loop calls parseStatement() repeatedly until it hits EOF, collecting statements into a Program node:

func Parse(tokens []Token) Program {
    statements := []
    while peek() is not EOF {
        stmt := parseStatement()
        statements = append(statements, stmt)
    }
    return Program{Statements: statements}
}

Parsing expressions with precedence climbing.

The expression parser is a chain of functions. Each level handles its operators and delegates to the next level for operands. Start at the bottom and build up:

// The entry point: parse any expression
func parseExpression() Expr {
    return parseLogicalOr()  // start at the lowest precedence
}

// Each level follows the same pattern
func parseLogicalOr() Expr {
    left := parseLogicalAnd()  // next level up
    while peek() is OR_OR or OR {
        op := advance()
        right := parseLogicalAnd()
        left = BinaryExpr{Left: left, Op: op, Right: right}
    }
    return left
}

// ... 8 more levels with identical structure ...

func parseMultiplication() Expr {
    left := parseUnary()
    while peek() is STAR or SLASH or PERCENT {
        op := advance()
        right := parseUnary()
        left = BinaryExpr{Left: left, Op: op, Right: right}
    }
    return left
}

// Unary is different: it's right-associative and recursive
func parseUnary() Expr {
    if peek() is MINUS or BANG or TILDE or NOT {
        op := advance()
        operand := parseUnary()  // recurse, not loop
        return UnaryExpr{Op: op, Operand: operand}
    }
    return parseCall()
}

Pattern recognition. Every binary level is the same structure: parse the higher level for the left side, loop while you see your operator, parse the higher level for the right side, wrap in BinaryExpr. Once you've written one, you've written them all.

Primary expressions: the base case.

At the bottom of the precedence chain is parsePrimary() — the function that handles literals, identifiers, and parenthesized groups. This is where recursion bottoms out.

func parsePrimary() Expr {
    switch peek().Type {
    case INT, FLOAT:
        return NumberExpr{Value: advance().Literal}
    case STRING:
        return StringExpr{Value: advance().Literal}
    case TRUE, FALSE:
        return BoolExpr{Value: advance().Type == TRUE}
    case NONE:
        advance()
        return NoneExpr{}
    case IDENT:
        return IdentExpr{Name: advance().Literal}
    case LPAREN:
        advance()  // consume (
        // Could be a grouped expression or a function literal
        // The parser needs to look ahead to decide
        expr := parseExpression()
        expect(RPAREN)
        return GroupExpr{Expr: expr}
    case LBRACKET:
        return parseArrayLiteral()
    case LBRACE:
        return parseRecordLiteral()
    default:
        error("expected expression, got " + peek().Literal)
    }
}

The tricky case: ( could start a grouped expression like (a + b) or a function literal like (x int) int { ... }. The parser needs to look ahead — if there's an identifier followed by a type, it's a function. This is the kind of ambiguity that makes hand-written parsers earn their keep.

Parsing statements: the top-level switch.

Statement parsing starts with a switch on the current token. Each keyword gets its own parsing function:

func parseStatement() Stmt {
    switch peek().Type {
    case LET, CONST:
        return parseVarDecl()
    case IF:
        return parseIfStmt()
    case WHILE:
        return parseWhileStmt()
    case FOR:
        return parseForStmt()
    case RETURN:
        return parseReturnStmt()
    case BREAK:
        return parseBreakStmt()
    case CONTINUE:
        return parseContinueStmt()
    case GUARD:
        return parseGuardStmt()
    case TYPE:
        return parseTypeDecl()
    case USE:
        return parseUseStmt()
    case EXPORT:
        return parseExportStmt()
    default:
        // Not a statement keyword — try parsing as expression
        return parseExpressionStatement()
    }
}

The default case is important. When the token doesn't match any statement keyword, it might be an expression statement (a function call, for example) or an assignment. The parser tries parsing an expression first, then checks if the next token is = or a compound assignment operator.

Variable declarations: the details.

A variable declaration has several optional parts:

let x = 42              // no type
let x int = 42          // with type
let x int               // no initializer (will be zero-valued)
const PI float = 3.14   // const with type

The parsing logic handles each case:

func parseVarDecl() VarDecl {
    isConst := advance().Type == CONST  // consume let/const
    name := expect(IDENT)               // variable name

    // Optional type annotation (space-separated)
    var typeName string
    if peek() is IDENT and peek() is not ASSIGN {
        typeName = advance().Literal
        // Check for array suffix [] or optional suffix ?
        if peek() is LBRACKET { ... }
        if peek() is QUESTION { ... }
    }

    // Optional initializer
    var init Expr
    if peek() is ASSIGN {
        advance()  // consume =
        init = parseExpression()
    }

    return VarDecl{
        IsConst: isConst,
        Name: name,
        Type: typeName,
        Init: init,
    }
}

The type-vs-equals ambiguity. After the variable name, the next token could be a type (int) or =. Since type names are identifiers and = is a distinct token, there's no real ambiguity — but the parser must check before consuming the identifier as a type.

Parsing function expressions.

Function literals are the most complex expression to parse. They start with (, which is ambiguous — is this a grouped expression or a function? The parser must look ahead.

The heuristic: if the token after ( is ) (empty params), or if there's an identifier followed by a type identifier, it's a function. Otherwise, it's a grouped expression.

// Function: (a int, b int) int { return a + b }
// Parts:    params          return  body
func parseFunctionExpr() FunctionExpr {
    expect(LPAREN)

    // Parse parameter list
    params := []
    while peek() is not RPAREN {
        name := expect(IDENT)
        typeName := parseType()
        params = append(params, Param{Name: name, Type: typeName})
        if peek() is COMMA { advance() }  // trailing comma OK
    }
    expect(RPAREN)

    // Optional return type
    returnType := ""
    if peek() is IDENT {
        returnType = parseType()
    }

    // Body block
    body := parseBlock()

    return FunctionExpr{Params: params, ReturnType: returnType, Body: body}
}

Control flow: if, while, for-in.

Control flow statements follow predictable patterns. Each one reads its keyword, parses a condition or binding, then parses a block body:

func parseIfStmt() IfStmt {
    advance()  // consume "if"
    condition := parseExpression()
    thenBlock := parseBlock()

    var elseBlock *BlockStmt
    if peek() is ELSE {
        advance()  // consume "else"
        if peek() is IF {
            // "else if" chains: parse as nested IfStmt wrapped in a block
            elseBlock = Block{parseIfStmt()}
        } else {
            elseBlock = parseBlock()
        }
    }
    return IfStmt{Condition: condition, Then: thenBlock, Else: elseBlock}
}

func parseForStmt() ForStmt {
    advance()         // consume "for"
    variable := expect(IDENT)
    expect(IN)        // consume "in"
    iterable := parseExpression()
    body := parseBlock()
    return ForStmt{Var: variable, Iterable: iterable, Body: body}
}

Notice how else if isn't a special case in the grammar. An else branch just contains another if statement. The parser handles chaining naturally through recursion.

Error recovery: don't stop at the first problem.

When the parser hits an unexpected token, it has two choices: crash, or recover and keep going. A good parser recovers. The strategy is called synchronization — skip tokens until you find a safe point to resume parsing.

Safe points are statement boundaries: keywords like let, const, if, return, or a closing brace }. When recovery kicks in, the parser discards the current broken statement and starts fresh at the next one.

func synchronize() {
    // Skip tokens until we find a statement boundary
    for peek() is not EOF {
        // These tokens typically start new statements
        if peek() is LET or CONST or IF or WHILE or
           FOR or RETURN or GUARD or TYPE or USE {
            return  // resume parsing here
        }
        advance()  // discard and move on
    }
}

This means a file with three errors will report all three, not just the first. The parser collects errors into a list and returns them alongside whatever AST it managed to build.

The 80/20 of error recovery. Synchronizing on statement boundaries catches most real-world errors. More sophisticated recovery (inserting missing tokens, suggesting fixes) is possible but rarely worth the complexity for a first implementation.

TDD: one grammar rule at a time.

The parser is ideal for test-driven development because each grammar rule is independent. The test for parsing if statements doesn't affect the test for parsing while loops.

The pattern for every test:

func TestParseVarDecl(t *testing.T) {
    // 1. Lex the source
    tokens := Scan("let x int = 42")

    // 2. Parse
    program := Parse(tokens)

    // 3. Assert the AST shape
    assert(len(program.Statements) == 1)

    varDecl := program.Statements[0].(VarDecl)
    assert(varDecl.Name == "x")
    assert(varDecl.Type == "int")
    assert(varDecl.IsConst == false)

    init := varDecl.Init.(NumberExpr)
    assert(init.Value == "42")
}

Build up gradually. Start with 42 (just a number). Then 1 + 2 (binary expression). Then 1 + 2 * 3 (precedence). Then let x = 1 + 2 * 3 (declaration wrapping an expression). Each test builds on the last.

Parser test challenge

Write a test that parses let x = 1 + 2 * 3 and verifies the AST has the correct shape:

VarDecl (name: "x", no type)

init: BinaryExpr (+)

left: NumberExpr (1)

right: BinaryExpr (*)

left: NumberExpr (2)

right: NumberExpr (3)

If the multiplication is a child of the addition (not the other way around), your precedence climbing is working correctly.

Trace: parsing a real Monk program.

Reading about parsing functions is one thing. Watching them work on real input is another. Here's what happens when the parser sees this single line:

let result = fibonacci(10)

parseStatement() · peek() → LET keyword

Dispatches to parseVarDecl()

parseVarDecl() · advance() consumes "let"

Reads name: IdentToken("result"). Next token is "=" not an ident → no type annotation.

parseVarDecl() · advance() consumes "="

Calls parseExpression() for the init value.

parseExpression() · → parseLogicalOr() → … → parsePrimary()

Bottoms out at a primary: IdentToken("fibonacci") → IdentExpr("fibonacci").

postfix loop · peek() is LPAREN

It's a call — advances past (, enters parseArgs().

parseArgs() · parsePrimary() for first arg

NumberToken(10) → NumberExpr(10). Next is RPAREN, stops.

parseExpression() · Returns CallExpr

CallExpr(callee: IdentExpr("fibonacci"), args: [NumberExpr(10)]).

parseVarDecl() · Assembles final node

VarDeclStmt(name: "result", isConst: false, type: nil, init: CallExpr(…)).

The final AST node handed to the type checker and codegen:

VarDeclStmt (name: "result", isConst: false, type: nil)

init: CallExpr

callee: IdentExpr ("fibonacci")

args[0]: NumberExpr (10)

Each step in the trace corresponds to exactly one function in parse_stmt.go or parse_expr.go. One grammar rule, one function — that's recursive descent.

Key takeaways

Recursive descent: one function per grammar rule. Expressions use precedence climbing (chain of functions). Statements use a switch on the token type.

Three helper methods do most of the work: peek(), advance(), expect().

Error recovery synchronizes on statement boundaries — skip broken tokens, resume at the next keyword. Report all errors, not just the first.

TDD is natural here. Each grammar rule gets its own test. Build up from 42 to full programs incrementally.