Intro
a-mir-formality is an early-stage experimental project maintained by the Rust types team.
Its goal is to be a complete, authoritative formal model of the Rust MIR and type system.
If successful, the intention is to bring this model into Rust as an RFC and develop it as an official part of the language definition.
Workspace structure
The project is a Cargo workspace with three main crates, organized in layers:
formality-macros— Procedural macros that generate boilerplate for formality terms. The#[term]macro auto-derivesParse,Debug,Fold, andVisittraits.formality-core— Language-independent foundation for formal semantics: variable binding, the judgment/proof system, fixed-point computation, and collections. Reusable for modeling any language, not just Rust.formality-rust— The Rust-specific model, containing three subsystems:grammar/— AST definitions for types, traits, impls, functions, ADTs, and a MIR subsetcheck/— Semantic checking (entry point:check_all_crates())prove/— Trait solving and type normalization
The root a-mir-formality binary parses a Rust program file and runs check_all_crates().
Data flow
Program text
→ try_term() (parse)
→ Program AST
→ check_all_crates()
→ per-crate checking
→ trait proving / normalization
→ ProofTree result
Running tests
# Run the full test suite (what CI runs)
cargo test --all
# Run all targets including doc tests
cargo test --all-targets
# Run a single integration test by name
cargo test -p a-mir-formality -- test_name
# Run tests in a specific crate
cargo test -p formality-rust
cargo test -p formality-core
To see tracing output from a test, set the RUST_LOG environment variable:
RUST_LOG=debug cargo test -p a-mir-formality -- test_name
Test macros
Integration tests use two snapshot-testing macros (powered by expect-test):
assert_ok!([ ... ])— asserts that a program type-checks successfullyassert_err!([ ... ] expect![[...]])— asserts that a program fails with a specific error message
formality_core: the Formality system
a-mir-formality is build on the formality core system,
defined by the formality_core crate.
Formality core is a mildly opnionated series of macros, derives, and types
that let you write high-level Rust code
in a way that resembles standard type theory notation.
Its primary purpose is to be used by a-mir-formality
but it can be used for other projects.
Defining your language
The very first thing you do to define your own language is to use the formality_core::declare_language! macro.
You use it like so:
#![allow(unused)]
fn main() {
formality_core::declare_language! {
pub mod rust {
const NAME = "Rust";
type Kind = crate::grammar::ParameterKind;
type Parameter = crate::grammar::Parameter;
const BINDING_OPEN = '<';
const BINDING_CLOSE = '>';
const KEYWORDS = [
"mut",
"struct",
"enum",
"union",
"const",
"true",
"false",
"static",
"let",
"in",
"loop",
"break",
"continue",
"return",
"fn_id",
"exists",
"call",
];
}
}
}
The declare_language macro will create a module with the name you specify (here, rust).
You have to tell it a few things:
- The
NAMEof your language, a string for debugging. - An enum defining the kinds of variables in your language; see the variables chapter for more information. For Rust, the kinds are types, lifetimes, and constants.
- An enum defining a parameter, which is the terms that can be used to represent the value of a variable; see the variables chapter for more information.
- Two characters
BINDER_OPENandBINDER_CLOSEDdefining the opening and closing characters for binders, e.g.,<and>, to use when parsing.
Contents of the language module
The language module you create has various items in it:
- A
struct FormalityLangthat defines your language. Some of the contains offormality_core(notably the traits that involve bound variables) are
Specifying the language for a crate
That module will contain a language struct named FormalityLang.
Other parts of the formality system (e.g., autoderives and the like)
will need to know the current language you are defining,
and they expect to find it at crate::FormalityLang.
Best practice is to add a use at the top of your crate defining your language.
For example, the formality_rust crate has:
#![allow(unused)]
fn main() {
/// Declare the language that we will use in `#[term]` macros.
pub use rust::FormalityLang;
}
Defining terms with the term macro
There are two or three key things to understand. The first is the #[term] macro. This is a procedural macro that you can attach to a struct or enum declaration that represents a piece of Rust syntax or part of the trait checking rules. It auto-derives a bunch of traits and functionality…
- rules for parsing from a string
- a
Debugimpl to serialize back out - folding and substitution
- upcasting and downcasting impls for convenient type conversion
For some types, we opt not to use #[term], and instead implement the traits by hand. There are also derives so you can derive some of the traits but not all.
Using #[term]
Let’s do a simple example. If we had a really simple language, we might define expressions like this:
#![allow(unused)]
fn main() {
#[term]
enum Expr {
#[cast]
Variable(Variable),
#[grammar($v0 + $v1)]
Add(Box<Expr>, Box<Expr>),
}
#[term($name)]
struct Variable {
name: String
}
}
The #[term] macro says that these are terms and we should generate all the boilerplate automatically. Note that it will generate code that references crate::FormalityLang so be sure to define your language appropriately.
The #[term] also accepts some internal annotations:
#[cast]can be applied on an enum variant with a single argument. It says that this variant represents an “is-a” relationship and hence we should generate upcast/downcast impls to allow conversion. In this case, a variable is a kind of expression – i.e,. wrapping a variable up into an expression doesn’t carry any semantic meaning – so we want variables to be upcastable to expressions (and expressions to be downcast to variables). The#[cast]attribute will therefore generate an implVariable: Upcast<Expr>that lets you convert a variable to an expression, and a downcast implExpr: Downcast<Variable>that lets you try to convert from an expression to a variable. Downcasting is fallible, which means that the downcast will only succeed if this is anExpr::Variable. If this is aExpr::Add, the downcast would returnNone.- There is a special case version of
#[cast]called#[variable]. It indicates that the variant represents a (type) variable – we are not using it here because this an expression variable. Variables are rather specially in folding/parsing to allow for substitution, binders, etc.
- There is a special case version of
#[grammar]tells the parser and pretty printer how to parse this variant. The$v0and$v1mean “recursively parse the first and second arguments”. This will parse aBox<Expr>, which of course is implemented to just parse anExpr.
If you are annotating a struct, the #[term] just accepts the grammar directly, so #[term($name)] struct Variable means “to parse a variable, just parse the name field (a string)”.
We could also define types and an environment, perhaps something like
#![allow(unused)]
fn main() {
#[term]
enum Type {
Integer, // Default grammar is just the word `integer`
String // Default grammar is just the word `string`
}
#[term] // Default grammar is just `env($bindings)`
struct Env {
bindings: Set<(Variable, Type)>
}
}
You can see that the #[term] macro will generate some default parsing rules if you don’t say anything.
We can then write code like
#![allow(unused)]
fn main() {
let env: Env = term("env({(x, integer)})");
}
This will parse the string, panicking if either the string cannot be parsed or or if it is ambiguous (can be parsing in mutiple ways). This is super useful in tests.
These terms are just Rust types, so you can define methods in the usual way, e.g. this Env::get method will search for a variable named v:
#![allow(unused)]
fn main() {
impl Env {
pub fn get(&self, v: &Variable) -> Option<&Type> {
self.bindings.iter()
.filter(|b| &b.0 == v)
.map(|b| &b.1)
.next()
}
}
}
Parsing
Formality’s #[term] and #[grammar] attributes let you specify a grammar for your structs and enums, and the parser will automatically parse strings into those types. Here is a simple example:
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[grammar($v0)]
LocalVariable(Id),
#[grammar($v0 $(v1))]
FnCall(Id, Vec<Id>),
}
formality_core::id!(Id);
}
This defines an Expr that is either a local variable (a single identifier like x) or a function call (like f(x, y)). The #[grammar] attribute on each variant describes how to parse it: $v0 means “parse the first field” and $(v1) means “parse a parenthesized, comma-separated list for the second field.” For structs, the grammar goes directly on #[term]:
#![allow(unused)]
fn main() {
#[term($name : $ty)]
pub struct TypedBinding {
name: Id,
ty: Ty,
}
}
This parses strings like x : i32.
Symbols
A grammar consists of a series of symbols. Each symbol matches some text in the input string. Symbols come in two varieties:
- Most things are terminals or tokens: this means they just match themselves:
- For example, the
*in#[grammar($v0 * $v1)]is a terminal, and it means to parse a*from the input. - Delimiters are accepted but must be matched, e.g.,
( /* tokens */ )or[ /* tokens */ ].
- For example, the
- The
$character is used to introduce special matches. Generally these are nonterminals, which means they parse the contents of a field, where the grammar for a field is determined by its type.- If fields have names, then
$fieldshould name the field. - For positional fields (e.g., the T and U in
Mul(Expr, Expr)), use$v0,$v1, etc.
- If fields have names, then
- Valid uses of
$are as follows:$field– just parse the field’s type$*field– the field must be a collection ofT(e.g.,Vec<T>,Set<T>) – parse any number ofTinstances. Something like[ $*field ]would parse[f1 f2 f3], assumingf1,f2, andf3are valid values forfield.$,field– similar to the above, but uses a comma separated list (with optional trailing comma). So[ $,field ]will parse something like[f1, f2, f3].$?field– will parsefieldand useDefault::default()value if not present.$<field>– parse<E1, E2, E3>, wherefieldis a collection ofE$<?field>– parse<E1, E2, E3>, wherefieldis a collection ofE, but accept empty string as empty vector$(field)– parse(E1, E2, E3), wherefieldis a collection ofE$(?field)– parse(E1, E2, E3), wherefieldis a collection ofE, but accept empty string as empty vector$[field]– parse[E1, E2, E3], wherefieldis a collection ofE$[?field]– parse[E1, E2, E3], wherefieldis a collection ofE, but accept empty string as empty vector${field}– parse{E1, E2, E3}, wherefieldis a collection ofE${?field}– parse{E1, E2, E3}, wherefieldis a collection ofE, but accept empty string as empty vector$:guard <nonterminal>– parses<nonterminal>but only if the keywordguardis present. For example,$:where $,where_clauseswould parsewhere WhereClause1, WhereClause2, WhereClause3but would also accept nothing (in which case, you would get an empty vector).$!– marks a commit point, see the section on commit points below$$– parse the terminal$
Default grammar
If no grammar is supplied, the default grammar is determined as follows:
- If a
#[cast]or#[variable]annotation is present, then the default grammar is just$v0. - Otherwise, the default grammar is the name of the type (for structs) or variant (for enums), followed by
(), with the values for the fields in order. SoMul(Expr, Expr)would have a default grammarmul($v0, $v1).
Ambiguity: parse all possibilities, disambiguate at the top
The parser takes a parse-all-possibilities approach. When parsing an enum, all variants are attempted, and all successful parses are returned — not just the “best” one. Ambiguity in a child nonterminal propagates upward through the parse tree, producing a cross-product of possibilities. For example, if a struct has two fields and each field’s nonterminal is 2-ways ambiguous, the struct produces 2 × 2 = 4 successful parses.
Consider this example, which parses either a single identifier (e.g., x) or a “call” (e.g., x(y, z)):
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[grammar($v0)]
LocalVariable(Id),
#[grammar($v0 $(v1))]
FnCall(Id, Vec<Id>),
}
}
Given an input like x(y, z), the result is actually ambiguous. It could be that you parsed as an expression x with remaining text (y, z) or as a call expression; therefore, the parser returns both.
Disambiguation happens once, at the top level (when you call term("...") or core_term_with). Here we would filter out the parse as x because it failed to consume all the input. That leaves exactly one parse, and so term succeeds. If multiple distinct parses remain, the result is an ambiguity panic — this means your grammar genuinely has an unresolvable ambiguity for that input. You can resolve these with #[precedence] for intra-type ambiguity (like expression operator precedence) or #[reject] for cross-type ambiguity (like boundary ambiguity between two types).
Sometimes disambiguation happens during parsing. For example, if we add another layer the example, like Sum:
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[grammar($v0)]
LocalVariable(Id),
#[grammar($v0 $(v1))]
FnCall(Id, Vec<Id>),
}
#[term]
pub enum Sum {
#[grammar($v0 + $v1)]
Add(Expr, Expr),
}
}
Now when parsing x(y) + z as a Sum, we would recursively parse it as an Expr and encounter the same two results:
- parsed
xas aExpr::LocalVariable, remaining text(y) + z - parsed
x(y)as aExpr::FnCall, remaining text+ z
However, only the second one can be parsed as a Sum, because the Sum requires the next character to be +.
Failure, almost succeeding, and commit points
When parsing fails, we distinguish two cases:
- Failure — the input clearly didn’t match. Usually this means the first token wasn’t a valid start for the nonterminal. You’ll get an error like “expected an
Expr”. - Almost succeeded — we got part-way through parsing, consuming some tokens, but then hit an error. For example, parsing
"1 / / 3"as anExprmight give “expected anExpr, found/”.
The distinction matters when parsing optional ($?field) or repeated ($*field) nonterminals. If parsing field outright fails, we treat the field as absent and continue with its Default::default() value. If parsing field almost succeeds, we assume it was present but malformed, and report a syntax error.
The default rule is that parsing “almost” succeeds if it consumes at least one token. For example:
#![allow(unused)]
fn main() {
#[term]
enum Projection {
#[grammar(. $v0)]
Field(Id),
}
}
Parsing ".#" as a Projection would “almost” succeed — it consumes the . but then fails to find an identifier.
Sometimes this default is too aggressive. Consider Projection embedded in another type:
#![allow(unused)]
fn main() {
#[term($*projections . #)]
struct ProjectionsThenHash {
projections: Vec<Projection>,
}
}
We’d like ".#" to be a valid ProjectionsThenHash — zero projections followed by .#. But the parser sees the . as an “almost success” of a Projection, so it reports a syntax error instead of treating the projections list as empty.
Commit points
You can control this with $!, which marks a commit point. With $!, a parse is only considered to have “almost succeeded” if it reached the commit point. Before the commit point, failure is just failure (the variant wasn’t present).
Adding $! after the identifier in Projection fixes the problem:
#![allow(unused)]
fn main() {
#[term]
enum Projection {
#[grammar(. $v0 $!)]
Field(Id),
}
}
Now .# fails outright (the commit point after $v0 was never reached), so ProjectionsThenHash correctly treats the projections list as empty and parses .# as the trailing literal tokens.
See the parser_torture_tests::commit_points code for a working example.
Variables and scope
Most compilers parse source text into an ambiguous syntax tree first, then apply name resolution and semantics to figure out what things mean. Formality aims to parse more directly into a semantically meaningful representation — for example, distinguishing variables from identifiers during parsing rather than in a later pass.
Today this shows up in one place: variable scope. When you write <X> X, the binder <X> introduces X into scope, and the body X is parsed as a variable reference rather than an identifier. This is the one piece of context the parser carries, but we may expand context-sensitive parsing in the future.
How variable/identifier disambiguation works
The problem: when an enum has both a #[variable] variant and fields that use id! types, the same text could parse as either a variable or an identifier. Without disambiguation, this produces an ambiguity panic.
The solution: when the parser begins parsing an enum that has a #[variable] variant, it runs an upfront probe — “can this text be parsed as any in-scope variable?” If yes, it sets a flag on the parser stack. All id! types check this flag before accepting an identifier and reject the text if the flag is set.
To see how this works, consider a type system with types and permissions, where types can be variables, named types, or a permission applied to a type:
#![allow(unused)]
fn main() {
#[term]
pub enum Ty {
#[variable(Kind::Ty)]
Variable(Variable),
#[cast]
Id(Id),
#[grammar($v0 $v1)]
Apply(Perm, Arc<Ty>),
#[grammar($v0 :: $v1)]
Assoc(Arc<Ty>, AssocId),
}
#[term]
pub enum Perm {
#[variable(Kind::Perm)]
Variable(Variable),
}
formality_core::id!(Id);
formality_core::id!(AssocId);
}
Same-type disambiguation. With type variable T in scope, parse T as a Ty. Because Ty has a #[variable] variant, the parser probes: “can T be parsed as any in-scope variable?” Yes — T is a type variable. The flag is set. Now when the Id variant tries to parse T, Id (an id! type) checks the flag, sees it’s set, and rejects. Only Variable(T) succeeds. Without the flag, both Variable(T) and Id(T) would succeed — ambiguity panic.
Cross-type disambiguation. With perm variable P in scope, parse P i32 as a Ty. The parser begins parsing Ty at P and runs the probe. The probe asks “can P be parsed as any in-scope variable?” — and it can, because P is a perm variable. The probe checks for variables of any kind, not just the kind associated with this type’s #[variable] variant. So the flag is set. Id rejects P. But Perm’s #[variable] variant does match P, so the Apply variant succeeds: Apply(Variable(P), Id(i32)).
This cross-kind behavior is what makes the mechanism work across type boundaries. Ty’s probe finds the perm variable P even though Ty’s own #[variable] variant only accepts type variables. This prevents Id from claiming a name that belongs to a sibling type’s variable namespace.
Positional scoping. Parse i32::T as a Ty, with type variable T in scope. The outer Ty parse starts at i32::T. The probe asks “can i32::T be parsed as a variable?” No — i32 isn’t a variable name, so no flag is set. The Assoc variant parses i32 as a Ty, consumes ::, then parses T as an AssocId. At that text position, no type with a #[variable] variant is being parsed (only AssocId, an id! type, is being parsed here), so no probe has run. AssocId accepts T as a plain identifier. This is the right behavior — T after :: is an associated item name, not a variable reference.
See tests/parser_var_id_ambiguity.rs for working tests of all three scenarios.
Left-recursive grammars
We support left-recursive grammars. A grammar is left-recursive when one of its variants starts by parsing itself. For example, a path like a.b[c.d].e is naturally left-recursive:
#![allow(unused)]
fn main() {
#[term]
pub enum Path {
#[cast]
Id(Id),
#[grammar($v0 . $v1)]
Field(Arc<Path>, Id),
#[grammar($v0 [ $v1 ])]
Index(Arc<Path>, Arc<Path>),
}
formality_core::id!(Id);
}
This works because the parser handles left-recursion with a fixed-point loop: it first parses the base cases (just an Id), then repeatedly tries to extend those results with the left-recursive variants (Field, Index), accumulating all successful parses until no new ones are found. In this grammar there’s no ambiguity because . and [ are unambiguous delimiters between the parts.
Precedence
Often left-recursive grammars are ambiguous. For example, arithmetic expressions:
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[cast]
Id(Id),
#[grammar($v0 + $v1)]
#[precedence(1)]
Add(Arc<Expr>, Arc<Expr>),
#[grammar($v0 * $v1)]
#[precedence(2)]
Mul(Arc<Expr>, Arc<Expr>),
}
formality_core::id!(Id);
}
Without the #[precedence] annotations, a + b * c would have two parses: (a + b) * c and a + (b * c). The precedence annotations resolve this. Higher numbers bind tighter, so * (level 2) binds tighter than + (level 1), giving a + (b * c).
The default associativity is left, which can be written explicitly as #[precedence(L, left)]. You can also specify right-associativity (#[precedence(L, right)]) or non-associativity (#[precedence(L, none)]). This affects how things of the same level are parsed:
a + b + cwhen left-associative is(a + b) + ca + b + cwhen right-associative isa + (b + c)a + b + cwhen none-associative is an error.
Explicit parenthesization
One thing precedence doesn’t give you is user-controlled grouping. In a real expression grammar you’d want to write (a + b) * c to override the default precedence. Formality doesn’t currently handle this automatically. You have to add an explicit Parens variant to your grammar:
#![allow(unused)]
fn main() {
#[term]
pub enum PlaceExprData {
/// `x`
///
/// A variable reference. Whether this is a place (lvalue) or
/// value (rvalue) depends on context: place on the left of `=`
/// or as operand of `&`, value everywhere else.
#[cast]
Var(ValueId),
/// `* expr`
///
/// Dereference. Like `Var`, place vs value depends on context.
#[grammar(* $prefix)]
Deref { prefix: PlaceExpr },
/// `( expr )`
///
/// Parenthesized expression, needed so users can write `(*x).field`.
#[grammar(($v0))]
Parens(PlaceExpr),
/// `expr . field`
///
/// Field projection. Like `Var`, place vs value depends on context.
#[grammar($prefix . $field_name)]
Field {
prefix: PlaceExpr,
field_name: FieldName,
},
}
}
Here PlaceExprData has a prefix operator (* expr) and a postfix operator (expr . field). Without the Parens variant, there’s no way to distinguish *(x.field) from (*x).field. Adding #[grammar(($v0))] Parens(PlaceExpr) lets the user write (*x).field to make the grouping explicit.
This is admittedly a bit awkward. Formality’s design goal is that the type structure matches what you’d find in a paper, and Parens nodes are parser machinery, not semantics. We may improve this in the future.
Cross-type ambiguity and #[reject]
Precedence handles ambiguity within a single recursive type, but sometimes the ambiguity is between two types. Consider a grammar where a Perm can be composed by juxtaposition (Perm Perm) and a Ty can apply a perm to a type (also by juxtaposition):
#![allow(unused)]
fn main() {
#[term]
pub enum Perm {
#[grammar(leaf)]
Leaf,
#[grammar(given)]
Given,
#[cast]
Id(PermId),
#[grammar($v0 $v1)]
Apply(Arc<Perm>, Arc<Perm>),
}
formality_core::id!(PermId);
#[term]
pub enum Ty {
#[grammar($v0)]
Named(TyId),
#[grammar($v0 $v1)]
#[reject(Perm::Apply(..), _)]
ApplyPerm(Perm, Arc<Ty>),
}
formality_core::id!(TyId);
}
Parsing leaf x Data as a Ty is ambiguous. The parser doesn’t know where Perm ends and Ty begins:
Perm=leaf,Ty=ApplyPerm(Id(x), Named(Data))… i.e., the perm is justleafPerm=Apply(Leaf, Id(x)),Ty=Named(Data)… i.e., the perm isleaf x
Both consume all the input, so the normal disambiguation (longest match) doesn’t help. You could restructure the grammar, but in formality the grammar is the type, and you likely don’t want to extract a different type just to control parsing.
The #[reject] attribute solves this at the use site. In the example above, #[reject(Perm::Apply(..), _)] on Ty::ApplyPerm says “reject any parse where the first field is a compound perm.” The second interpretation is silently dropped, leaving just one unambiguous parse.
The syntax: each position in #[reject] corresponds to a field of the variant. Use _ to skip a field (no constraint). Non-wildcard positions use Rust patterns and are checked with matches! under the hood. All fields must be listed explicitly, or you can use trailing .. to skip remaining fields (e.g., #[reject(Perm::Apply(..), ..)]). Omitting fields without .. is a compile error. Multiple #[reject] attributes on the same variant are OR’d (any match rejects). Within a single #[reject], non-wildcard fields are AND’d.
For named fields (on structs), use name: pattern syntax:
#![allow(unused)]
fn main() {
#[term($perm $ty)]
#[reject(perm: Perm::Apply(..), ty: _)]
pub struct ApplyPerm {
perm: Perm,
ty: Arc<Ty>,
}
}
See tests/parser-torture-tests/reject.rs for the full set of examples.
Customizing the parse
If you prefer, you can customize the parse by annotating your term with #[customize(parse)]. In the Rust case, for example, the parsing of RigidTy is customized (as is the debug impl):
#![allow(unused)]
fn main() {
#[term((rigid $name $*parameters))]
#[customize(parse, debug)]
pub struct RigidTy {
pub name: RigidName,
pub parameters: Parameters,
}
}
You must then supply an impl of CoreParse<L> for your language type L. Inside the impl you will want to instantiate a Parser and then invoke parse_variant for every variant. The key methods on ActiveVariant are:
each_nonterminal(|value: T, p| { ... })— the primary way to parse a child nonterminal. Instead of returning a single result, it calls the continuation for each successful parse ofT, passing a forkedActiveVariantpositioned at that parse’s remaining text. This is how ambiguity propagates: ifThas 2 parses, the continuation runs twice, and the results are collected.each_comma_nonterminal(|items: Vec<T>, p| { ... })— parse a comma-separated list, then run the continuation with the result.each_opt_nonterminal(|opt: Option<T>, p| { ... })— parse an optional nonterminal.each_many_nonterminal(|items: Vec<T>, p| { ... })— parse zero or more nonterminals.each_delimited_nonterminal(open, optional, close, |items: Vec<T>, p| { ... })— parse a delimited comma-separated list (e.g.,<T1, T2>).p.ok(value)— wrap a value into a successful parse result. Use this instead ofOk(value)at the end of variant closures.expect_char(c),expect_keyword(kw)— consume expected tokens.reject_nonterminal::<T>()— fail if the input starts with aT(useful for disambiguation).
The continuation-passing style means nonterminal parsing is always nested: you parse the first field, then in its continuation parse the second field, and so on. Each each_* call forks for each ambiguous child result, so you get the cross-product automatically.
In the Rust code, the impl for RigidTy looks as follows:
#![allow(unused)]
fn main() {
// Implement custom parsing for rigid types.
impl CoreParse<Rust> for RigidTy {
fn parse<'t>(scope: &Scope<Rust>, text: &'t str) -> ParseResult<'t, Self> {
Parser::multi_variant(scope, text, "RigidTy", |parser| {
// Parse a `ScalarId` (and upcast it to `RigidTy`) with the highest
// precedence. If someone writes `u8`, we always interpret it as a
// scalar-id.
parser.parse_variant_cast::<ScalarId>(Precedence::default());
// Parse something like `Id<...>` as an ADT.
parser.parse_variant("Adt", Precedence::default(), |p| {
// Don't accept scalar-ids as Adt names.
p.reject_nonterminal::<ScalarId>()?;
p.each_nonterminal(|name: AdtId, p| {
each_parse_parameters(p, |parameters, p| {
p.ok(RigidTy {
name: name.clone().upcast(),
parameters,
})
})
})
});
// Parse `&`
parser.parse_variant("Ref", Precedence::default(), |p| {
p.expect_char('&')?;
p.each_nonterminal(|lt: Lt, p| {
p.each_nonterminal(|ty: Ty, p| {
p.ok(RigidTy {
name: RigidName::Ref(RefKind::Shared),
parameters: seq![lt.clone().upcast(), ty.upcast()],
})
})
})
});
parser.parse_variant("RefMut", Precedence::default(), |p| {
p.expect_char('&')?;
p.expect_keyword("mut")?;
p.each_nonterminal(|lt: Lt, p| {
p.each_nonterminal(|ty: Ty, p| {
p.ok(RigidTy {
name: RigidName::Ref(RefKind::Mut),
parameters: seq![lt.clone().upcast(), ty.upcast()],
})
})
})
});
parser.parse_variant("RawConst", Precedence::default(), |p| {
p.expect_char('*')?;
p.expect_keyword("const")?;
p.each_nonterminal(|ty: Ty, p| {
p.ok(RigidTy {
name: RigidName::Raw(PtrKind::Const),
parameters: seq![ty.upcast()],
})
})
});
parser.parse_variant("RawMut", Precedence::default(), |p| {
p.expect_char('*')?;
p.expect_keyword("mut")?;
p.each_nonterminal(|ty: Ty, p| {
p.ok(RigidTy {
name: RigidName::Raw(PtrKind::Mut),
parameters: seq![ty.upcast()],
})
})
});
parser.parse_variant("Tuple", Precedence::default(), |p| {
p.expect_char('(')?;
p.reject_custom_keywords(&["alias", "rigid", "predicate"])?;
p.each_comma_nonterminal(|types: Vec<Ty>, p| {
p.expect_char(')')?;
let name = RigidName::Tuple(types.len());
p.ok(RigidTy {
name,
parameters: types.upcast(),
})
})
});
})
}
}
}
Customizing the debug
By default, the #[term] macro will generate a Debug impl that is guided by the #[grammar] attributes on your type (see the parsing section for more details). But sometimes you want to generate custom logic. You can include a #[customize(debug)] declaration to allow that. Most of the type, when you do this, you will also want to customize parsing, as the RigidTy does:
#![allow(unused)]
fn main() {
#[term((rigid $name $*parameters))]
#[customize(parse, debug)]
pub struct RigidTy {
pub name: RigidName,
pub parameters: Parameters,
}
}
Now you must simply implement Debug in the usual way. Here is the RigidTy declaration:
#![allow(unused)]
fn main() {
impl Debug for RigidTy {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let RigidTy { name, parameters } = self;
match name {
RigidName::AdtId(name) => {
write!(
f,
"{:?}{:?}",
name,
PrettyParameters::new("<", ">", parameters)
)
}
RigidName::ScalarId(s) if parameters.is_empty() => {
write!(f, "{:?}", s)
}
RigidName::Ref(RefKind::Shared) if parameters.len() == 2 => {
write!(f, "&{:?} {:?}", parameters[0], parameters[1])
}
RigidName::Ref(RefKind::Mut) if parameters.len() == 2 => {
write!(f, "&mut {:?} {:?}", parameters[0], parameters[1])
}
RigidName::Raw(PtrKind::Const) if parameters.len() == 1 => {
write!(f, "*const {:?}", parameters[0])
}
RigidName::Raw(PtrKind::Mut) if parameters.len() == 1 => {
write!(f, "*mut {:?}", parameters[0])
}
RigidName::Tuple(arity) if parameters.len() == *arity => {
if *arity != 0 {
write!(f, "{:?}", PrettyParameters::new("(", ")", parameters))
} else {
// PrettyParameters would skip the separators
// for 0 arity
write!(f, "()")
}
}
_ => {
write!(f, "{:?}{:?}", name, PrettyParameters::angle(parameters))
}
}
}
}
}
Constructors
Unless you include #[customize(constructors)], the #[term] macro automatically creates constructors as follows:
- For a
struct, defines anewmethod that takes animpl Upcast<T>for each of your fields. - For an
enum, defines a method per variant that has fields (converted to snake-case).- If the name of the variant is a Rust keyword like
Struct, the method will be calledstruct_. - We do not generate constructors for variants with no arguments.
- If the name of the variant is a Rust keyword like
Variables
As in Defining your lang, declare_language! implements Language by setting the associated types Kind and Parameter. They must satisfy the bounds on that trait (including HasKind for Parameter); in formality_rust both are enums:
Kind— the sorts of parameter (type, lifetime, const, …). Classifies variables and binders.Parameter— the values a parameter can be instantiated to.
The generated language module also defines pub type aliases in grammar (Variable, Binder, Substitution, …) that specialize the formality_core types to FormalityLang.
Internally, variables use CoreVariable<L>:
UniversalVariables— treated as an unknown assumed in the environment.ExistentialVariables— a metavariable the rules will eventually constrain or solve.BoundVariables— tied to an enclosing binder (de Bruijn level and slot index).Binder::openintroduces fresh bound variables withdebruijn: None(seeCoreBoundVar::fresh)Binder::newcloses the binder again.
CoreVariable::is_free: universals and existentials are always free. A bound variable is free iff debruijn is None (including after Binder::open).
In #[term] definitions, use #[variable] on variable variants so folding applies substitutions and parsing resolves names against in-scope bindings (see Defining terms with the term macro).
Collections
When using formality, it’s best to use the following collection types:
- for sequences, use the standard
Vectype formality_core::Set<T>– equivalent toBTreeSetbut shorter. We use aBTreeSetbecause it has deterministic ordering for all operations.formality_core::Map<K, V>– equivalent toBTreeMapbut shorter. We use aBTreeMapbecause it has deterministic ordering for all operations.
Macros
We also define macros:
seq![...]– equivalent tovec![]but permits flattening with..notation, as described belowset![...]– likeseq!, but produces aSet
In these macros you can “flatten” things that support IntoIterator, so set![..a, ..b] will effectively perform a set union of a and b.
Casting between collections and tuples
It is possible to upcast from variable tuple types to produce collections:
- A
Vec<E1>can be upcast to aVec<E2>ifE1: Upcast<E2>. - A
Set<E1>can be upcast to aSet<E2>ifE1: Upcast<E2>. - Tuples of elements (e.g.,
(E1, E2)or(E1, E2, E3)) can be upcast to sets up to a fixed arity. - Sets and vectors can be downcast to
()and(E, C), where()succeeds only for empty collections, and(E, C)extracts the first elementEand a collectionCwith all remaining elements (note that elements in sets are always ordered, so the first element is well defined there). This is useful when writing judgment rules that operate over sequences and sets.
Judgment functions and inference rules
The next thing is the judgment_fn! macro. This lets you write a judgment using inference rules. A “judgment” just means some kind of predicate that the computer can judge to hold or not hold. Inference rules are those rules you may have seen in papers and things:
premise1
premise2
premise3
------------------
conclusion
i.e., the conclusion is judged to be true if all the premises are true.
Judgments in type system papers can look all kinds of ways. For example, a common type system judgment would be the following:
Γ ⊢ E : T
This can be read as, in the environment Γ, the expression E has type T. You might have rule like these:
Γ[X] = ty // lookup variable in the environment
--------------- "variable"
Γ ⊢ X : ty
Γ ⊢ E1 : T // must have the same type
Γ ⊢ E2 : T
--------------- "add"
Γ ⊢ E1 + E2 : T
In a-mir-formality, you might write those rules like so:
#![allow(unused)]
fn main() {
judgment_fn! {
pub fn has_type(
env: Env,
expr: Expr,
) => Type {
(
(env.get(&name) => ty)
---------------
(has_type(env, name: Variable) => ty)
)
(
(has_type(env, left) => ty_left)
(has_type(env, right) => ty_right)
(if ty_left == ty_right)
---------------
(has_type(env, Expr::Add(left, right)) => ty_left)
)
}
}
}
Unlike mathematical papers, where judgments can look like whatever you want, judgments in a-mir-formality always have a fixed form that distinguish inputs and outputs:
judgment_name(input1, input2, input3) => output
In this case, has_type(env, expr) => ty is the equivalent of Γ ⊢ E : T. Note that, by convention, we tend to use more English style names, so env and not Γ, and expr and not E. Of course nothing is stop you from using single letters, it’s just a bit harder to read.
When we write the judgement_fn, it is going to desugar into an actual Rust function that looks like this:
#![allow(unused)]
fn main() {
pub fn has_type(arg0: impl Upcast<Env>, arg1: impl Upcast<Expr>) -> ProvenSet<Type> {
let arg0: Env = arg0.upcast();
let arg1: Expr = arg1.upcast();
...
}
}
Some things to note. First, the function arguments (arg0, arg1) implicitly accept anything that “upcasts” (infallibly converts) into the desired types. Upcast is a trait defined within a-mir-formality and implemented by the #[term] macro automatically.
Second, the function always returns a ProvenSet. This is because there can be more rules, and they may match in any ways. The generated code is going to exhaustively search to find all the ways that the rules could match. Unlike a plain Set, a ProvenSet also preserves proof trees for successful results and structured failure information when no rule applies. At a high-level the code looks like this (at least if we ignore the possibility of cycles; we’ll come back to that later):
#![allow(unused)]
fn main() {
pub fn has_type(arg0: impl Upcast<Env>, arg1: impl Upcast<Expr>) -> ProvenSet<Type> {
let arg0: Env = arg0.upcast();
let arg1: Expr = arg1.upcast();
let mut results = Map::new();
if /* inference rule 1 matches */ {
results.insert(/* result from inference rule 1 */, /* proof tree */);
}
if /* inference rule 2 matches */ {
results.insert(/* result from inference rule 2 */, /* proof tree */);
}
// ...
if /* inference rule N matches */ {
results.insert(/* result from inference rule N */, /* proof tree */);
}
if !results.is_empty() {
ProvenSet::proven(results)
} else {
ProvenSet::failed_rules(/* judgment */, /* location */, /* failed rules */)
}
}
}
So how do we test if a particular inference rule matches? Let’s look more closely at the code we would generate for this inference rule:
#![allow(unused)]
fn main() {
(
(env.get(name) => ty)
---------------
(has_type(env, name: Variable) => ty)
)
}
The first part of the final line, has_type(env, name: Variable), defines patterns that are matched against the arguments. These are matched against the arguments (arg0, arg1) from the judgment. Patterns can either be trivial bindings (like env) or more complex (like name: Variable or Expr::Add(left, right)). In the latter case, they don’t have to match the type of the argument precisely. Instead, we use the Downcast trait combined with pattern matching. So this inference rule would compile to something like…
#![allow(unused)]
fn main() {
// Simple variable bindings just clone...
let env = arg0.clone();
// More complex patterns are downcasted and testing...
if let Some(name) = arg1.downcast::<Variable>() {
... // rule successfully matched! See below.
}
}
Once we’ve matched the arguments, we start trying to execute the inference rule conditions. We have one, env.get(&name) => ty. What does that do? A condition written like $expr => $pat basically becomes a for loop, so you get…
#![allow(unused)]
fn main() {
let env = arg0.clone();
if let Some(name) = arg1.downcast::<Variable>() {
for ty in env.get(&name) {
... // other conditions, if any
}
}
}
Once we run out of conditions, we can generate the final result, which comes from the => $expr in the conclusion of the rule. In this case, something like this:
#![allow(unused)]
fn main() {
let env = arg0.clone();
if let Some(name) = arg1.downcast::<Variable>() {
for ty in env.get(&name) {
result.push(ty);
}
}
}
Thus each inference rule is converted into a little block of code that may push results onto the final set.
The second inference rule (for addition) looks like…
#![allow(unused)]
fn main() {
// given this...
// (
// (has_type(env, left) => ty_left)
// (has_type(env, right) => ty_right)
// (if ty_left == ty_right)
// ---------------
// (has_type(env, Expr::Add(left, right)) => ty_left)
// )
// we generate this...
let env = arg0.clone();
if let Some(Expr::Add(left, right)) = arg1.downcast() {
for ty_left in has_type(env, left) {
for ty_right in has_type(env, right) {
if ty_left == ty_right {
result.push(ty_left);
}
}
}
}
}
If you want to see a real judgement, take a look at the one for proving where clauses.
Handling cycles
Judgment functions must be inductive, which means that cycles are considered failures. We have a tabling implementation, which means we detect cycles and try to handle them intelligently. Basically we track a stack and, if a cycle is detected, we return an empty set of results. But we remember that the cycle happened. Then, once we are done, we’ll have computed some intermediate set of results R[0], and we execute again. This time, when we get the cycle, we return R[0] instead of an empty set. This will compute some new set of results, R[1]. So then we try again. We keep doing this until the new set of results R[i] is equal to the previous set of results R[i-1]. At that point, we have reached a fixed point, so we stop. Of course, it could be that you get an infinitely growing set of results, and execution never terminates. This means your rules are broken. Don’t do that.
FAQ and troubleshooting
Why am I getting errors about undefined references to crate::FormalityLang?
The various derive macros need to know what language you are working in.
To figure this out, they reference crate::FormalityLang, which you must define.
See the chapter on defining your language for more details.
formality_rust: the Rust model
formality_rust is the Rust-specific layer of a-mir-formality. It is organized into three main subsystems:
grammar/defines Rust-specific termscheck/runs semantic checkingprove/discharges logical obligations
grammar/
The grammar module defines the Rust terms used by the model: crates, items, types, where-clauses, trait references, and the MIR-like expression language used in function bodies.
check/
The main checking entry point is check_all_crates.
- inserts an empty
corecrate if the first crate is notcore - checks every prefix of the crate list
- treats the last crate in each prefix as the current crate
Within a crate, checking rejects duplicate item names, checks each crate item, and then runs coherence checking.
prove/
The prove module answers Rust-specific goals such as where-clauses, equality, subtyping, and outlives. Checking code calls into prove whenever it needs to establish those facts.
Pipeline
At a high level, checking a Rust program looks like this:
Crates
-> check_all_crates
-> item checking / borrow checking
-> prove obligations as needed
Borrow checking
Borrow checking lives under check/borrow_check/. Its top-level judgment is borrow_check, which checks that the loans issued while executing a basic block are respected.
borrow_check
At the top level, borrow checking works over:
- a
TypeckEnv - a set of assumptions
- a
FlowState - a
Block
borrow_check delegates to borrow_check_block, which walks the block and updates the flow state statement by statement.
FlowState
FlowState is the flow-sensitive state threaded through borrow checking. It contains:
scopescurrentbreakscontinues
Its current field is a PointFlowState, which tracks:
loans_liveoutlives
Loans
A Loan represents a borrow produced by an expression like &'a place. It records:
- the lifetime of the borrow
- the borrowed place
- the borrow kind
Borrow checking adds loans to the current flow state and checks later accesses against the live loans.
Liveness
The borrow checker computes liveness with LivePlaces and the LiveBefore trait.
This is used to determine which places must be valid before a statement or expression, including at break and continue targets.
Outlives
Borrow checking also accumulates pending outlives constraints, represented as PendingOutlives { a, b }.
The outlives.rs code verifies these constraints:
- existential variables succeed trivially
- universal variables must be justified by the assumptions
Coherence checking
Coherence checking lives in check/coherence.rs and runs as part of crate checking.
At a high level, check_coherence:
- rejects duplicate impls in the current crate
- checks the current crate’s impls for overlap against all impls in the program
- runs orphan checking for positive impls in the current crate
- runs orphan checking for negative impls in the current crate
Orphan checking
orphan_check and orphan_check_neg instantiate the impl binder universally and prove that the trait reference is local under the impl’s where-clauses.
Duplicate impls
ensure_no_duplicate_impls rejects duplicate positive impls in the current crate before overlap checking runs.
Overlap checking
overlap_check_impl compares two impls of the same trait.
It first skips identical impls and impls for different traits. For matching trait ids, it tries to prove that the impls cannot both apply. If that fails, it reports the impls as overlapping.
Negative impls
Negative impls participate in coherence through orphan checking.