Lesson 4.5
References & Further Reading
Compile-to-C strategies, code generation patterns, and source mapping.
The essential reads.
Crafting Interpreters -- Chapters 14-30
by Robert Nystrom
The second half of the book builds a bytecode compiler and virtual machine in C. Monk doesn't target bytecode, but the code generation principles are the same: walking an AST, emitting instructions, managing local variables, and handling function compilation. Chapters 22-25 (locals, functions, closures, garbage collection) are especially relevant to understanding the trade-offs Monk makes.
craftinginterpreters.com/chunks-of-bytecode.htmlA Retargetable C Compiler: Design and Implementation
by Christopher Fraser and David Hanson
The book behind lcc, the classic compile-to-C compiler. Fraser and Hanson built a full C compiler that emits C as an intermediate step. The architecture -- front end, IR, code generation -- maps directly to what Monk does, just at a much larger scale. The discussion of code shape (how to emit C constructs that C compilers optimize well) is particularly instructive.
Engineering a Compiler -- Chapter 7: Code Shape
by Keith Cooper and Linda Torczon
Chapter 7 covers how a compiler decides what shape the generated code should take -- how to translate boolean expressions, array accesses, function calls, and control flow into lower-level constructs. The "code shape" concept is exactly what Monk's translation rules implement: each AST node type maps to a specific pattern of C code.
Languages that compile to C.
Monk is not alone in targeting C. Several successful languages took the same path, each for different reasons.
Chicken Scheme
A Scheme implementation that compiles to C using a continuation-passing style with Cheney on the MTA (a clever garbage collection technique that uses the C stack as a nursery). Demonstrates that compile-to-C can be both portable and performant. The generated C is less readable than Monk's, but the compilation strategy is remarkable.
call-cc.orgCython
Compiles a Python-like language to C, bridging Python's ease of use with C's performance. Like Monk, it generates readable C that calls into a runtime library (CPython's). The translation rules are more complex because Cython must interop with Python objects, but the architecture is structurally similar.
cython.orgNim
Nim compiles to C (and also to C++, JavaScript, and LLVM IR). Its C backend produces surprisingly readable code. Worth studying to see how a more mature language handles the compile-to-C approach, especially for generics, closures, and garbage collection.
nim-lang.orgSource mapping and debugging.
GCC #line Directive Documentation
The official documentation for the #line preprocessor directive. Covers the exact syntax, behavior with warnings and errors, and interaction with debuggers. Short and precise.
Source Maps (JavaScript)
JavaScript's source map format solves the same problem as #line directives: mapping generated code back to source. The format is more complex (base64 VLQ encoding) because it maps columns as well as lines. Useful context for understanding why Monk's approach is simpler -- C's preprocessor gives us source mapping for free.
Suggested reading order.
Crafting Interpreters Ch. 14-17 -- understand bytecode compilation. Even though Monk emits C, the concept of walking an AST and emitting instructions is identical.
Engineering a Compiler Ch. 7 -- code shape theory. What patterns of generated code work well, and why.
Chicken Scheme docs -- a successful compile-to-C language. See how they solved closures and GC.
GCC #line docs -- quick reference for source mapping, which you will use constantly.
Fraser & Hanson (lcc) -- only if you want to go deep on compile-to-C architecture. Heavy but rewarding.