Lesson 2.1

What Is a Parser?

Tokens in, tree out. The parser gives your program structure.

concepts 12 min read

Sentences have structure you already parse.

Consider the sentence: "The cat sat on the mat." You don't process it as six isolated words. Your brain groups them: "The cat" is the subject, "sat" is the verb, "on the mat" is a prepositional phrase. You build a mental tree, even if you don't realize it.

A parser does the same thing for code. The lexer already split the source into tokens — words. Now the parser figures out how those words relate: which expression is the argument, which is the condition, where does the block end. It builds a tree that captures these relationships.

This tree is called an Abstract Syntax Tree (AST). "Abstract" because it drops irrelevant details — parentheses, commas, whitespace — and keeps only the structure that matters for understanding the program.

Why a tree and not a list.

Here's the problem. The lexer gives us a flat stream of tokens:

INT 1PLUSINT 2STARINT 3

Is this (1 + 2) * 3 = 9 or 1 + (2 * 3) = 7? The flat token list doesn't tell you. Both readings use the same tokens in the same order. You need structure to encode which operation happens first.

A tree solves this. In the correct tree, the multiplication is deeper — it's a child of the addition node. Deeper nodes evaluate first. So the tree shape itself encodes the precedence rules.

AST for 1 + 2 * 3 (multiplication binds tighter)

BinaryExpr (+)

NumberExpr (1)

BinaryExpr (*)

NumberExpr (2)

NumberExpr (3)

The tree shape IS the meaning. To evaluate, walk the tree bottom-up. The deepest nodes get computed first. 2 * 3 = 6 first, then 1 + 6 = 7. No precedence rules needed at evaluation time — they're already baked into the tree structure.

What the parser does, step by step.

The parser reads tokens one at a time from the lexer and builds tree nodes. Here's a Monk variable declaration going through the pipeline:

let result = 1 + 2 * 3

See LET token

This is a variable declaration. Expect an identifier next, then optionally a type, then =, then an expression.

Read IDENT "result"

The variable name. Store it in the AST node being built.

Read ASSIGN

Confirms this is a declaration with an initializer. Now parse the expression on the right side.

Parse expression 1 + 2 * 3

This recursively builds the tree shown above — a BinaryExpr(+) with NumberExpr(1) and BinaryExpr(*) as children.

The result: a VarDecl node with the name "result", no type annotation, and the expression tree as its initializer. One clean tree node that captures everything the compiler needs.

Syntax vs semantics: what the parser ignores.

The parser checks syntax — is the program structurally valid? Are the tokens in a legal order? It does NOT check semantics — does the program make sense?

Parser accepts this:

let x int = "hello" — syntactically valid. Type mismatch is a semantic error, caught later.

Parser accepts this:

let y = x + z — syntactically valid. Whether x and z exist is a semantic question.

Parser rejects this:

let = 42 — missing variable name. Syntax error.

Parser rejects this:

1 + * 3 — two operators in a row. No valid grammar rule matches this.

Two ways to build a parser.

There are two broad approaches. Monk uses the first.

Monk's choice

Hand-written recursive descent

You write a function for each grammar rule. parseStatement() calls parseIfStatement() or parseVarDecl() depending on the token. Each function can call others recursively. Full control over error messages, easy to debug, easy to extend.

Parser generators (yacc, bison, ANTLR)

You write a grammar file, and a tool generates the parser for you. Fast to get started, but the generated code is opaque. When something goes wrong, you're debugging someone else's code. Error messages are often generic and unhelpful.

Most modern languages (Go, Rust, Swift, TypeScript) use hand-written parsers. The control over error reporting is worth the extra code.

A bigger example: parsing an if statement.

if x > 10 {
    let msg = "big"
} else {
    let msg = "small"
}

The parser sees the IF token and knows the structure: condition, then block, then optionally ELSE and another block. It builds:

IfStmt

condition: BinaryExpr (>)

left: IdentExpr ("x")

right: NumberExpr (10)

then: BlockStmt

VarDecl (msg = StringExpr("big"))

else: BlockStmt

VarDecl (msg = StringExpr("small"))

Notice what's gone: the curly braces, the if and else keywords, the newlines. The tree keeps only the meaning. The braces told the parser where blocks start and end — once the tree is built, they're no longer needed.

Key takeaways

A parser turns a flat token stream into a tree (AST). The tree structure encodes precedence, grouping, and nesting.

The parser checks syntax (structure), not semantics (meaning). Type mismatches and undefined variables are someone else's problem.

Deeper nodes in the tree evaluate first. The tree shape IS the precedence.

Monk uses a hand-written recursive descent parser — one function per grammar rule. Most modern languages do the same.