Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Code generation

The codegen module translates the high-level expression grammar into MiniRust basic blocks. Its entry point is codegen_function, which takes a function body (a Block) and produces a lang::Function containing a map of basic blocks, local variables, and an entry point.

CodeBlock: the control-flow builder

The core abstraction is CodeBlock — a single-entry, multi-exit control-flow code block that is incrementally constructed. It is an enum with three variants representing the builder’s state:

CodeBlock [src]
/// A single-entry, multi-exit control flow code.
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]
pub enum CodeBlock {
    /// Anonymous open block being constructed. Has no name yet — a name is
    /// allocated lazily when the block is terminated or needs to be referenced.
    ///
    /// Suppose you have `let x = 22;` this would create a set of statements like
    ///
    /// ```text
    /// stmts: [
    ///   let tmp1;
    ///   tmp1 = 22;
    ///   let x;
    ///   x = tmp;
    /// ]
    /// ```
    ///
    /// and then you have `let y = 33;`
    ///
    /// ```text
    /// stmts: [
    ///   let tmp2;
    ///   tmp2 = 33;
    ///   let y;
    ///   y = tmp2;
    /// ]
    /// ```
    ///
    /// and then you append those two so you get
    ///
    /// ```text
    /// stmts = [... /* as above, concatenated */]
    /// ```
    Block { stmts: Vec<lang::Statement> },

    /// Finalized blocks plus one open fallthrough block.
    ///
    /// For something like this `if foo { 22 } else { 33 }; ...``
    ///
    /// ```text
    /// blocks:
    ///   a {
    ///     if foo goto b else c
    ///   }
    ///   b {
    ///      tmp = 22
    ///      goto d
    ///   }
    ///   c {
    ///     tmp = 33
    ///     goto d
    ///   }
    /// entry: a
    /// fallthrough: d
    /// fallthrough_stmts: [...] // from the `...` above
    /// }}
    /// ```
    Open {
        /// Blocks that have a terminator and thus are "done".
        /// They may branch to each other or to the (incomplete) fallthrough block, as shown above.
        blocks: Map<lang::BbName, lang::BasicBlock>,

        /// The entry block in the list above or the fallthrough.
        entry: lang::BbName,

        /// The name of the not-yet-compelted fallthrough block, which may be referenced above.
        fallthrough: lang::BbName,

        /// The statements in the fallthrough block so far.
        fallthrough_stmts: Vec<lang::Statement>,
    },

    /// Finalized blocks (i.e., blocks with terminators) only. No successors.
    Closed {
        /// Blocks that have a terminator and thus are "done".
        blocks: Map<lang::BbName, lang::BasicBlock>,

        /// The entry block in the list above or the fallthrough.
        entry: lang::BbName,
    },
}

Lifecycle

A code block starts as an anonymous Block (just accumulating statements) and transitions to named states (Open / Closed) once it needs basic-block structure:

flowchart LR
    start(( )) -->|"new()"| Block
    start -->|"named(bb)"| Open
    Block -->|"terminate(_, None)"| Closed
    Block -->|"terminate(_, Some(bb))"| Open
    Closed -->|"terminate(_, Some(bb))"| Open
    Open -->|"terminate(_, None)"| Closed
    Open -->|"terminate(_, Some(bb))"| Open

    Block["
        Block

        Anonymous — just a list of statements with no name yet. A name is allocated lazily when the block is terminated.
    "]

    Closed["
        Closed

        All blocks are finalized. No fallthrough.
    "]
    
    Open["
        Open

        Finalized blocks plus one open fallthrough block (named, accepting statements).
    "]

Block is the starting state for simple expressions — statements accumulate without allocating any basic-block names. Most expressions stay in Block and get inlined into their parent via append. Once a terminator is needed (a call, branch, or explicit control flow), terminate names the block and transitions to Closed or Open.

Open is the state for regions that already have named blocks. Statements accumulate into the named fallthrough block. terminate finalizes the fallthrough and either closes the code block or opens a new fallthrough.

Once a code block is Closed, mutations (with_stmt, terminate) are no-ops. This naturally handles dead code after return or break. However, terminate with Some(next_bb) on a Closed code block will re-open it — this is used for blocks reachable via break that aren’t continuations of the previous terminator.

Core operations

OperationEffect
with_stmt(s)Appends a statement to the current fallthrough (Block or Open)
terminate(t, None)Finalizes the fallthrough block → Closed
terminate(t, Some(bb))Finalizes the fallthrough block, opens bb as the new fallthrough → Open
append(other)Sequential composition (see below)
into_blocks()Extracts the final Map<BbName, BasicBlock>

The append operation

append is how sequential composition works. Its behavior depends on whether other is anonymous or named:

selfotherBehavior
BlockBlockConcatenate statements (both anonymous — no names needed)
BlockOpen/ClosedName self, emit Goto to other’s entry, merge other’s blocks
OpenBlockExtend fallthrough with other’s statements
OpenOpen/ClosedEmit Goto from fallthrough to other’s entry, merge other’s blocks

The key invariant: once a block is named, it is never destroyed. When appending a named code block (Open or Closed), its entry block is preserved as a jump target — append emits a Goto to it rather than inlining it. This ensures that blocks referenced by jumps (loop headers, break targets) always survive in the final map.

Compound helpers

These methods on CodegenFn combine terminate with block allocation:

HelperWhat it does
call(code block, fn, args, ret)Allocates next_bb, terminates with Call { next_block: next_bb } and opens next_bb
branch_on_bool_from(code block, cond, then, else)Allocates a join_bb, terminates with Switch, absorbs both branches, opens join_bb
build_loop(loop_start, exit, body)Creates a named code block at loop_start, appends body, back-edge to loop_start, opens exit

Judgments

codegen_block

The top-level block codegen creates a fresh code block, then appends each statement’s code block in sequence:

codegen_block [src]
codegen_block(global: CodegenGlobal, cfn: CodegenFn, scope: CodegenScope, block: Block,) => (CodeBlock, CodegenGlobal, CodegenFn)

codegen_stmt

Each statement produces a CodeBlock that gets appended to the enclosing block’s code block:

codegen_stmt [src]
codegen_stmt(global: CodegenGlobal, cfn: CodegenFn, scope: CodegenScope, stmt: Stmt,) => (CodeBlock, CodegenScope, CodegenGlobal, CodegenFn)

codegen_expr_into

Expressions are compiled into a target local. Each rule returns a CodeBlock:

codegen_expr_into [src]
codegen_expr_into(global: CodegenGlobal, cfn: CodegenFn, scope: CodegenScope, target: MiniRustLocal, expr: Expr,) => (CodeBlock, CodegenGlobal, CodegenFn)

Walkthrough: literal expression

The simplest case — a literal like 42_u32:

codegen_expr_into::literal [src]
(let code = cfn.fresh_code_block())
---- ("literal")
(codegen_expr_into(global, cfn, scope, target, Expr::Literal { value, ty }) => (
    code.assign(target, constant(value, ty)),
    global,
    cfn,
))
  1. fresh_code_block() creates a CodeBlock::Block { stmts: [] } (anonymous)
  2. .assign(target, constant(42, u32)) pushes one Assign statement → still Block { stmts: [assign] }

The caller appends this single-block code block into its own code block. Since it’s just a Block, append concatenates the statements directly — no basic-block names are allocated.

graph LR
    subgraph "returned CodeBlock (Block)"
        anon["(anonymous): target = const 42_u32"]
    end

Walkthrough: function call

A function call like foo(a, b) is more interesting:

codegen_expr_into::call [src]
(type_expr(cfn, scope, callee) => callee_ty)
(resolve_rigid(cfn, scope, callee_ty) => RigidTy { name: RigidName::FnDef(fn_id), parameters })
(let (fn_name, global) = global.ensure_fn(MonoKey::new(fn_id, parameters)))
(let (callee_temp, cfn) = cfn.alloc_temp(&callee_ty)?)
(codegen_expr_into(global, cfn, scope, callee_temp, callee) => (code, global, cfn))
(let (temps, cfn) = alloc_temps_for_args(cfn, scope, args)?)
(for_all(i in 0..args.len()) with(code, global, cfn)
    (codegen_expr_into(global, cfn, scope, &temps[i], &args[i]) => (arg_code, global, cfn))
    (let (code, cfn) = cfn.append_from(code, arg_code)))
(let (code, cfn) = cfn.call(code, fn_name, temps, target)?)
---- ("call")
(codegen_expr_into(global, cfn, scope, target, Expr::Call { callee, args }) => (code, global, cfn))
  1. Fresh code block starts as Block { stmts: [] }
  2. For each argument, codegen_expr_into returns a Block, and append concatenates its statements
  3. .call(...) allocates next_bb, terminates with Call { next_block: next_bb }Open with next_bb as fallthrough
graph LR
    subgraph "returned CodeBlock (Open)"
        bb0["bb0: arg stmts... → Call foo"]
        bb1["bb1: (open, empty)"]
        bb0 --> bb1
    end

The caller can now append more work into bb1.

Walkthrough: if/else

An if b { ... } else { ... } statement:

codegen_stmt::if [src]
(let (ct, cfn) = cfn.alloc_local(bool_ty()))
(codegen_expr_into(global, cfn, scope, ct, condition) => (cond_code, global, cfn))
(codegen_block(global, cfn, scope, then_block) => (then_code, global, cfn))
(codegen_block(global, cfn, scope, else_block) => (else_code, global, cfn))
(let (code, cfn) = cfn.branch_on_bool_from(cond_code, ct, then_code, else_code))
---- ("if")
(codegen_stmt(global, cfn, scope, Stmt::If { condition, then_block, else_block }) => (code, scope, global, cfn))
  1. Codegen the condition into a temp (cond_region — typically a Block)
  2. Codegen both branches independently (each returns its own CodeBlock)
  3. branch_on_bool_from terminates cond_region with a Switch, absorbs both branches, and opens a join block
graph TD
    subgraph "returned CodeBlock"
        cond["bb0: eval condition → Switch"]
        then_bb["bb1: then branch"]
        else_bb["bb2: else branch"]
        join["bb3: (open, join point)"]
        cond -->|"true"| then_bb
        cond -->|"false"| else_bb
        then_bb -->|Goto| join
        else_bb -->|Goto| join
    end

If both branches diverge (e.g., both return), there is no join block — the result is Closed.

Walkthrough: loop

A 'label: loop { body }:

codegen_stmt::loop [src]
(let label = require_label(label)?)
(let (loop_start, cfn) = cfn.fresh_bb())
(let (exit_block, cfn) = cfn.fresh_bb())
(let scope = scope.with_label(&label.id, Some(loop_start), exit_block))
(codegen_block(global, cfn, scope, body) => (body_code, global, cfn))
(let (code, cfn) = build_loop(cfn, loop_start, exit_block, body_code)?)
---- ("loop")
(codegen_stmt(global, cfn, scope, Stmt::Loop { label, body }) => (code, scope, global, cfn))

The build_loop helper constructs:

  1. CodeBlock::named(loop_start) — starts in Open state at loop_start
  2. Append the body code block (emits Goto from parent fallthrough into loop_start)
  3. If body has fallthrough, terminate with Goto(loop_start) back-edge and open exit
  4. If body diverges, open exit anyway (reachable via break)
graph TD
    subgraph "returned CodeBlock (Open)"
        loop_start["loop_start: body stmts... → Goto loop_start"]
        exit["exit: (open, fallthrough)"]
        loop_start -->|"back-edge"| loop_start
        loop_start -.->|"break"| exit
    end

A break 'label compiles to terminate(Goto(exit_block)), directing control to the exit.

When this loop code block is appended into the parent, the parent’s fallthrough block gets a Goto to loop_start — since loop_start is a named block, append preserves it as a jump target rather than inlining it.

Putting it all together

The final step is build_function, which:

  1. If the body code block still has fallthrough, appends a unit assignment + Return terminator
  2. Extracts the entry block name
  3. Calls into_blocks() to get the final Map<BbName, BasicBlock>
  4. Prepends StorageLive statements to the entry block for all non-argument locals
  5. Returns a lang::Function ready for MiniRust interpretation