Intro
formality_core: the Formality system
a-mir-formality is build on the formality core system,
defined by the formality_core crate.
Formality core is a mildly opnionated series of macros, derives, and types
that let you write high-level Rust code
in a way that resembles standard type theory notation.
Its primary purpose is to be used by a-mir-formality
but it can be used for other projects.
Defining your language
The very first thing you do to define your own language is to use the formality_core::declare_language! macro.
You use it like so:
#![allow(unused)]
fn main() {
formality_core::declare_language! {
pub mod rust {
const NAME = "Rust";
type Kind = crate::grammar::ParameterKind;
type Parameter = crate::grammar::Parameter;
const BINDING_OPEN = '<';
const BINDING_CLOSE = '>';
const KEYWORDS = [
"mut",
"struct",
"enum",
"union",
"const",
"true",
"false",
"static",
];
}
}
}
The declare_language macro will create a module with the name you specify (here, rust).
You have to tell it a few things:
- The
NAMEof your language, a string for debugging. - An enum defining the kinds of variables in your language; see the variables chapter for more information. For Rust, the kinds are types, lifetimes, and constants.
- An enum defining a parameter, which is the terms that can be used to represent the value of a variable; see the variables chapter for more information.
- Two characters
BINDER_OPENandBINDER_CLOSEDdefining the opening and closing characters for binders, e.g.,<and>, to use when parsing.
Contents of the language module
The language module you create has various items in it:
- A
struct FormalityLangthat defines your language. Some of the contains offormality_core(notably the traits that involve bound variables) are
Specifying the language for a crate
That module will contain a language struct named FormalityLang.
Other parts of the formality system (e.g., autoderives and the like)
will need to know the current language you are defining,
and they expect to find it at crate::FormalityLang.
Best practice is to add a use at the top of your crate defining your language.
For example, the formality_rust crate has:
#![allow(unused)]
fn main() {
/// Declare the language that we will use in `#[term]` macros.
pub use rust::FormalityLang;
}
Defining terms with the term macro
There are two or three key things to understand. The first is the #[term] macro. This is a procedural macro that you can attach to a struct or enum declaration that represents a piece of Rust syntax or part of the trait checking rules. It auto-derives a bunch of traits and functionality…
- rules for parsing from a string
- a
Debugimpl to serialize back out - folding and substitution
- upcasting and downcasting impls for convenient type conversion
For some types, we opt not to use #[term], and instead implement the traits by hand. There are also derives so you can derive some of the traits but not all.
Using #[term]
Let’s do a simple example. If we had a really simple language, we might define expressions like this:
#![allow(unused)]
fn main() {
#[term]
enum Expr {
#[cast]
Variable(Variable),
#[grammar($v0 + $v1)]
Add(Box<Expr>, Box<Expr>),
}
#[term($name)]
struct Variable {
name: String
}
}
The #[term] macro says that these are terms and we should generate all the boilerplate automatically. Note that it will generate code that references crate::FormalityLang so be sure to define your language appropriately.
The #[term] also accepts some internal annotations:
#[cast]can be applied on an enum variant with a single argument. It says that this variant represents an “is-a” relationship and hence we should generate upcast/downcast impls to allow conversion. In this case, a variable is a kind of expression – i.e,. wrapping a variable up into an expression doesn’t carry any semantic meaning – so we want variables to be upcastable to expressions (and expressions to be downcast to variables). The#[cast]attribute will therefore generate an implVariable: Upcast<Expr>that lets you convert a variable to an expression, and a downcast implExpr: Downcast<Variable>that lets you try to convert from an expression to a variable. Downcasting is fallible, which means that the downcast will only succeed if this is anExpr::Variable. If this is aExpr::Add, the downcast would returnNone.- There is a special case version of
#[cast]called#[variable]. It indicates that the variant represents a (type) variable – we are not using it here because this an expression variable. Variables are rather specially in folding/parsing to allow for substitution, binders, etc.
- There is a special case version of
#[grammar]tells the parser and pretty printer how to parse this variant. The$v0and$v1mean “recursively parse the first and second arguments”. This will parse aBox<Expr>, which of course is implemented to just parse anExpr.
If you are annotating a struct, the #[term] just accepts the grammar directly, so #[term($name)] struct Variable means “to parse a variable, just parse the name field (a string)”.
We could also define types and an environment, perhaps something like
#![allow(unused)]
fn main() {
#[term]
enum Type {
Integer, // Default grammar is just the word `integer`
String // Default grammar is just the word `string`
}
#[term] // Default grammar is just `env($bindings)`
struct Env {
bindings: Set<(Variable, Type)>
}
}
You can see that the #[term] macro will generate some default parsing rules if you don’t say anything.
We can then write code like
#![allow(unused)]
fn main() {
let env: Env = term("env({(x, integer)})");
}
This will parse the string, panicking if either the string cannot be parsed or or if it is ambiguous (can be parsing in mutiple ways). This is super useful in tests.
These terms are just Rust types, so you can define methods in the usual way, e.g. this Env::get method will search for a variable named v:
#![allow(unused)]
fn main() {
impl Env {
pub fn get(&self, v: &Variable) -> Option<&Type> {
self.bindings.iter()
.filter(|b| &b.0 == v)
.map(|b| &b.1)
.next()
}
}
}
Parsing
Formality’s #[term] and #[grammar] attributes let you specify a grammar for your structs and enums, and the parser will automatically parse strings into those types. Here is a simple example:
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[grammar($v0)]
LocalVariable(Id),
#[grammar($v0 $(v1))]
FnCall(Id, Vec<Id>),
}
formality_core::id!(Id);
}
This defines an Expr that is either a local variable (a single identifier like x) or a function call (like f(x, y)). The #[grammar] attribute on each variant describes how to parse it: $v0 means “parse the first field” and $(v1) means “parse a parenthesized, comma-separated list for the second field.” For structs, the grammar goes directly on #[term]:
#![allow(unused)]
fn main() {
#[term($name : $ty)]
pub struct TypedBinding {
name: Id,
ty: Ty,
}
}
This parses strings like x : i32.
Symbols
A grammar consists of a series of symbols. Each symbol matches some text in the input string. Symbols come in two varieties:
- Most things are terminals or tokens: this means they just match themselves:
- For example, the
*in#[grammar($v0 * $v1)]is a terminal, and it means to parse a*from the input. - Delimiters are accepted but must be matched, e.g.,
( /* tokens */ )or[ /* tokens */ ].
- For example, the
- The
$character is used to introduce special matches. Generally these are nonterminals, which means they parse the contents of a field, where the grammar for a field is determined by its type.- If fields have names, then
$fieldshould name the field. - For positional fields (e.g., the T and U in
Mul(Expr, Expr)), use$v0,$v1, etc.
- If fields have names, then
- Valid uses of
$are as follows:$field– just parse the field’s type$*field– the field must be a collection ofT(e.g.,Vec<T>,Set<T>) – parse any number ofTinstances. Something like[ $*field ]would parse[f1 f2 f3], assumingf1,f2, andf3are valid values forfield.$,field– similar to the above, but uses a comma separated list (with optional trailing comma). So[ $,field ]will parse something like[f1, f2, f3].$?field– will parsefieldand useDefault::default()value if not present.$<field>– parse<E1, E2, E3>, wherefieldis a collection ofE$<?field>– parse<E1, E2, E3>, wherefieldis a collection ofE, but accept empty string as empty vector$(field)– parse(E1, E2, E3), wherefieldis a collection ofE$(?field)– parse(E1, E2, E3), wherefieldis a collection ofE, but accept empty string as empty vector$[field]– parse[E1, E2, E3], wherefieldis a collection ofE$[?field]– parse[E1, E2, E3], wherefieldis a collection ofE, but accept empty string as empty vector${field}– parse{E1, E2, E3}, wherefieldis a collection ofE${?field}– parse{E1, E2, E3}, wherefieldis a collection ofE, but accept empty string as empty vector$:guard <nonterminal>– parses<nonterminal>but only if the keywordguardis present. For example,$:where $,where_clauseswould parsewhere WhereClause1, WhereClause2, WhereClause3but would also accept nothing (in which case, you would get an empty vector).$!– marks a commit point, see the section on commit points below$$– parse the terminal$
Default grammar
If no grammar is supplied, the default grammar is determined as follows:
- If a
#[cast]or#[variable]annotation is present, then the default grammar is just$v0. - Otherwise, the default grammar is the name of the type (for structs) or variant (for enums), followed by
(), with the values for the fields in order. SoMul(Expr, Expr)would have a default grammarmul($v0, $v1).
Ambiguity: parse all possibilities, disambiguate at the top
The parser takes a parse-all-possibilities approach. When parsing an enum, all variants are attempted, and all successful parses are returned — not just the “best” one. Ambiguity in a child nonterminal propagates upward through the parse tree, producing a cross-product of possibilities. For example, if a struct has two fields and each field’s nonterminal is 2-ways ambiguous, the struct produces 2 × 2 = 4 successful parses.
Consider this example, which parses either a single identifier (e.g., x) or a “call” (e.g., x(y, z)):
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[grammar($v0)]
LocalVariable(Id),
#[grammar($v0 $(v1))]
FnCall(Id, Vec<Id>),
}
}
Given an input like x(y, z), the result is actually ambiguous. It could be that you parsed as an expression x with remaining text (y, z) or as a call expression; therefore, the parser returns both.
Disambiguation happens once, at the top level (when you call term("...") or core_term_with). Here we would filter out the parse as x because it failed to consume all the input. That leaves exactly one parse, and so term succeeds. If multiple distinct parses remain, the result is an ambiguity panic — this means your grammar genuinely has an unresolvable ambiguity for that input.
Sometimes disambiguation happens during parsing. For example, if we add another layer the example, like Sum:
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[grammar($v0)]
LocalVariable(Id),
#[grammar($v0 $(v1))]
FnCall(Id, Vec<Id>),
}
#[term]
pub enum Sum {
#[grammar($v0 + $v1)]
Add(Expr, Expr),
}
}
Now when parsing x(y) + z as a Sum, we would recursively parse it as an Expr and encounter the same two results:
- parsed
xas aExpr::LocalVariable, remaining text(y) + z - parsed
x(y)as aExpr::FnCall, remaining text+ z
However, only the second one can be parsed as a Sum, because the Sum requires the next character to be +.
Failure, almost succeeding, and commit points
When parsing fails, we distinguish two cases:
- Failure — the input clearly didn’t match. Usually this means the first token wasn’t a valid start for the nonterminal. You’ll get an error like “expected an
Expr”. - Almost succeeded — we got part-way through parsing, consuming some tokens, but then hit an error. For example, parsing
"1 / / 3"as anExprmight give “expected anExpr, found/”.
The distinction matters when parsing optional ($?field) or repeated ($*field) nonterminals. If parsing field outright fails, we treat the field as absent and continue with its Default::default() value. If parsing field almost succeeds, we assume it was present but malformed, and report a syntax error.
The default rule is that parsing “almost” succeeds if it consumes at least one token. For example:
#![allow(unused)]
fn main() {
#[term]
enum Projection {
#[grammar(. $v0)]
Field(Id),
}
}
Parsing ".#" as a Projection would “almost” succeed — it consumes the . but then fails to find an identifier.
Sometimes this default is too aggressive. Consider Projection embedded in another type:
#![allow(unused)]
fn main() {
#[term($*projections . #)]
struct ProjectionsThenHash {
projections: Vec<Projection>,
}
}
We’d like ".#" to be a valid ProjectionsThenHash — zero projections followed by .#. But the parser sees the . as an “almost success” of a Projection, so it reports a syntax error instead of treating the projections list as empty.
Commit points
You can control this with $!, which marks a commit point. With $!, a parse is only considered to have “almost succeeded” if it reached the commit point. Before the commit point, failure is just failure (the variant wasn’t present).
Adding $! after the identifier in Projection fixes the problem:
#![allow(unused)]
fn main() {
#[term]
enum Projection {
#[grammar(. $v0 $!)]
Field(Id),
}
}
Now .# fails outright (the commit point after $v0 was never reached), so ProjectionsThenHash correctly treats the projections list as empty and parses .# as the trailing literal tokens.
See the parser_torture_tests::commit_points code for a working example.
Variables and scope
Most compilers parse source text into an ambiguous syntax tree first, then apply name resolution and semantics to figure out what things mean. Formality aims to parse more directly into a semantically meaningful representation — for example, distinguishing variables from identifiers during parsing rather than in a later pass.
Today this shows up in one place: variable scope. When you write <X> X, the binder <X> introduces X into scope, and the body X is parsed as a variable reference rather than an identifier. This is the one piece of context the parser carries, but we may expand context-sensitive parsing in the future.
To see why this matters, let’s extend our Expr example with a #[variable] variant:
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[variable(Kind::Expr)]
Variable(Variable),
#[grammar($v0)]
Name(Id),
#[grammar($v0 ( $,v1 ))]
FnCall(Id, Vec<Expr>),
}
}
#[variable] variants get priority
When an enum has a #[variable] variant, the parser tries it first, before any other variants. If the name is found in scope with the correct kind, the variable parse succeeds and the remaining variants are skipped entirely.
With X in scope, consider parsing X(y, z):
- The
Variablevariant tries first, findsXin scope — succeeds withVariable(X), remaining text(y, z). - The
NameandFnCallvariants are never tried.
At the top level, Variable(X) didn’t consume all the input, so this is a parse error. The #[variable] short-circuit prevented the FnCall parse from ever being attempted.
This is often the right behavior for a type grammar — if T is a type variable, then T is always a type variable, never a type name. But it’s a tradeoff: the short-circuit is aggressive, and it means the parser commits to the variable interpretation without seeing the surrounding context. For our Expr grammar, we probably don’t want #[variable] at all — function names are identifiers, not variables. We’d remove the #[variable] variant and let the parse-all-possibilities approach handle things naturally, as we saw in the Ambiguity section.
match_var = false on id! types
The #[variable] short-circuit resolves ambiguity within an enum — when the variable variant is present, it wins. But what about indirect ambiguity, where a variable and an identifier compete in different variants of a parent enum?
The id! macro generates identifier types. By default, id!() types will happily parse any valid identifier string, even if that string happens to be a variable in scope. To make an id! type reject variable names, declare it with match_var = false:
#![allow(unused)]
fn main() {
formality_core::id!(TraitId, match_var = false);
}
To see why this is useful, imagine a simplified type system where types are only type variables — no named types like i32. A bound like T : U could mean either “T implements trait U” or “T is a subtype of type U”:
#![allow(unused)]
fn main() {
#[term]
pub enum Ty {
#[variable(Kind::Ty)]
Variable(Variable),
}
#[term]
enum Bound {
#[grammar($v0 : $v1)]
Impl(Ty, TraitId),
#[grammar($v0 : $v1)]
Subtype(Ty, Ty),
}
formality_core::id!(TraitId); // default: accepts any identifier
}
With T and U in scope as type variables, parsing T : U is ambiguous. The Subtype variant parses U as a Ty, which succeeds via the #[variable] rule. The Impl variant parses U as a TraitId — and since id!(TraitId) accepts any identifier string (including variable names), it also succeeds. Two parses, ambiguity panic.
Adding match_var = false fixes this:
#![allow(unused)]
fn main() {
formality_core::id!(TraitId, match_var = false); // rejects variables in scope
}
Now when parsing T : U, the Impl variant tries to parse U as a TraitId. Because U is a variable in scope, match_var = false rejects it, so only Subtype succeeds. Parsing T : Debug (where Debug is not a variable) would still succeed as Impl, because match_var = false only rejects names that are actually variables in scope.
Identifiers that appear in positions where they should accept variable names — like associated item names after :: in <T as Trait>::Item — can use the default id!(AssociatedItemId) which accepts any valid identifier string.
Precedence and left-recursive grammars
We support left-recursive grammars like this one from the parse-torture-tests:
#![allow(unused)]
fn main() {
#[term]
pub enum Path {
#[cast]
Id(Id),
#[grammar($v0 . $v1)]
Field(Arc<Path>, Id),
#[grammar($v0 [ $v1 ])]
Index(Arc<Path>, Arc<Path>),
}
formality_core::id!(Id);
}
We also support ambiguous grammars. For example, you can code up arithmetic expressions like this:
#![allow(unused)]
fn main() {
#[term]
pub enum Expr {
#[cast]
Id(Id),
#[grammar($v0 + $v1)]
#[precedence(1)]
Add(Arc<Expr>, Arc<Expr>),
#[grammar($v0 * $v1)]
#[precedence(2)]
Mul(Arc<Expr>, Arc<Expr>),
}
formality_core::id!(Id);
}
When specifying the #[precedence] of a variant, the default is left-associativity, which can be written more explicitly as #[precedence(L, left)]. If you prefer, you can specify right-associativity (#[precedence(L, right)]) or non-associativity #[precedence(L, none)]. This affects how things of the same level are parsed:
1 + 1 + 1when left-associative is(1 + 1) + 11 + 1 + 1when right-associative is1 + (1 + 1)1 + 1 + 1when none-associative is an error.
Customizing the parse
If you prefer, you can customize the parse by annotating your term with #[customize(parse)]. In the Rust case, for example, the parsing of RigidTy is customized (as is the debug impl):
#![allow(unused)]
fn main() {
#[term((rigid $name $*parameters))]
#[customize(parse, debug)]
pub struct RigidTy {
pub name: RigidName,
pub parameters: Parameters,
}
}
You must then supply an impl of CoreParse<L> for your language type L. Inside the impl you will want to instantiate a Parser and then invoke parse_variant for every variant. The key methods on ActiveVariant are:
each_nonterminal(|value: T, p| { ... })— the primary way to parse a child nonterminal. Instead of returning a single result, it calls the continuation for each successful parse ofT, passing a forkedActiveVariantpositioned at that parse’s remaining text. This is how ambiguity propagates: ifThas 2 parses, the continuation runs twice, and the results are collected.each_comma_nonterminal(|items: Vec<T>, p| { ... })— parse a comma-separated list, then run the continuation with the result.each_opt_nonterminal(|opt: Option<T>, p| { ... })— parse an optional nonterminal.each_many_nonterminal(|items: Vec<T>, p| { ... })— parse zero or more nonterminals.each_delimited_nonterminal(open, optional, close, |items: Vec<T>, p| { ... })— parse a delimited comma-separated list (e.g.,<T1, T2>).p.ok(value)— wrap a value into a successful parse result. Use this instead ofOk(value)at the end of variant closures.expect_char(c),expect_keyword(kw)— consume expected tokens.reject_nonterminal::<T>()— fail if the input starts with aT(useful for disambiguation).
The continuation-passing style means nonterminal parsing is always nested: you parse the first field, then in its continuation parse the second field, and so on. Each each_* call forks for each ambiguous child result, so you get the cross-product automatically.
In the Rust code, the impl for RigidTy looks as follows:
#![allow(unused)]
fn main() {
// Implement custom parsing for rigid types.
impl CoreParse<Rust> for RigidTy {
fn parse<'t>(scope: &Scope<Rust>, text: &'t str) -> ParseResult<'t, Self> {
Parser::multi_variant(scope, text, "RigidTy", |parser| {
// Parse a `ScalarId` (and upcast it to `RigidTy`) with the highest
// precedence. If someone writes `u8`, we always interpret it as a
// scalar-id.
parser.parse_variant_cast::<ScalarId>(Precedence::default());
// Parse something like `Id<...>` as an ADT.
parser.parse_variant("Adt", Precedence::default(), |p| {
// Don't accept scalar-ids as Adt names.
p.reject_nonterminal::<ScalarId>()?;
p.each_nonterminal(|name: AdtId, p| {
each_parse_parameters(p, |parameters, p| {
p.ok(RigidTy {
name: name.clone().upcast(),
parameters,
})
})
})
});
// Parse `&`
parser.parse_variant("Ref", Precedence::default(), |p| {
p.expect_char('&')?;
p.each_nonterminal(|lt: Lt, p| {
p.each_nonterminal(|ty: Ty, p| {
p.ok(RigidTy {
name: RigidName::Ref(RefKind::Shared),
parameters: seq![lt.clone().upcast(), ty.upcast()],
})
})
})
});
parser.parse_variant("RefMut", Precedence::default(), |p| {
p.expect_char('&')?;
p.expect_keyword("mut")?;
p.each_nonterminal(|lt: Lt, p| {
p.each_nonterminal(|ty: Ty, p| {
p.ok(RigidTy {
name: RigidName::Ref(RefKind::Mut),
parameters: seq![lt.clone().upcast(), ty.upcast()],
})
})
})
});
parser.parse_variant("Tuple", Precedence::default(), |p| {
p.expect_char('(')?;
p.reject_custom_keywords(&["alias", "rigid", "predicate"])?;
p.each_comma_nonterminal(|types: Vec<Ty>, p| {
p.expect_char(')')?;
let name = RigidName::Tuple(types.len());
p.ok(RigidTy {
name,
parameters: types.upcast(),
})
})
});
})
}
}
}
Customizing the debug
By default, the #[term] macro will generate a Debug impl that is guided by the #[grammar] attributes on your type (see the parsing section for more details). But sometimes you want to generate custom logic. You can include a #[customize(debug)] declaration to allow that. Most of the type, when you do this, you will also want to customize parsing, as the RigidTy does:
#![allow(unused)]
fn main() {
#[term((rigid $name $*parameters))]
#[customize(parse, debug)]
pub struct RigidTy {
pub name: RigidName,
pub parameters: Parameters,
}
}
Now you must simply implement Debug in the usual way. Here is the RigidTy declaration:
#![allow(unused)]
fn main() {
impl Debug for RigidTy {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let RigidTy { name, parameters } = self;
match name {
RigidName::AdtId(name) => {
write!(
f,
"{:?}{:?}",
name,
PrettyParameters::new("<", ">", parameters)
)
}
RigidName::ScalarId(s) if parameters.is_empty() => {
write!(f, "{:?}", s)
}
RigidName::Ref(RefKind::Shared) if parameters.len() == 2 => {
write!(f, "&{:?} {:?}", parameters[0], parameters[1])
}
RigidName::Ref(RefKind::Mut) if parameters.len() == 2 => {
write!(f, "&mut {:?} {:?}", parameters[0], parameters[1])
}
RigidName::Tuple(arity) if parameters.len() == *arity => {
if *arity != 0 {
write!(f, "{:?}", PrettyParameters::new("(", ")", parameters))
} else {
// PrettyParameters would skip the separators
// for 0 arity
write!(f, "()")
}
}
_ => {
write!(f, "{:?}{:?}", name, PrettyParameters::angle(parameters))
}
}
}
}
}
Constructors
Unless you include #[customize(constructors)], the #[term] macro automatically creates constructors as follows:
- For a
struct, defines anewmethod that takes animpl Upcast<T>for each of your fields. - For an
enum, defines a method per variant that has fields (converted to snake-case).- If the name of the variant is a Rust keyword like
Struct, the method will be calledstruct_. - We do not generate constructors for variants with no arguments.
- If the name of the variant is a Rust keyword like
Variables
Collections
When using formality, it’s best to use the following collection types:
- for sequences, use the standard
Vectype formality_core::Set<T>– equivalent toBTreeSetbut shorter. We use aBTreeSetbecause it has deterministic ordering for all operations.formality_core::Map<K, V>– equivalent toBTreeMapbut shorter. We use aBTreeMapbecause it has deterministic ordering for all operations.
Macros
We also define macros:
seq![...]– equivalent tovec![]but permits flattening with..notation, as described belowset![...]– likeseq!, but produces aSet
In these macros you can “flatten” things that support IntoIterator, so set![..a, ..b] will effectively perform a set union of a and b.
Casting between collections and tuples
It is possible to upcast from variable tuple types to produce collections:
- A
Vec<E1>can be upcast to aVec<E2>ifE1: Upcast<E2>. - A
Set<E1>can be upcast to aSet<E2>ifE1: Upcast<E2>. - Tuples of elements (e.g.,
(E1, E2)or(E1, E2, E3)) can be upcast to sets up to a fixed arity. - Sets and vectors can be downcast to
()and(E, C), where()succeeds only for empty collections, and(E, C)extracts the first elementEand a collectionCwith all remaining elements (note that elements in sets are always ordered, so the first element is well defined there). This is useful when writing judgment rules that operate over sequences and sets.
Judgment functions and inference rules
The next thing is the judgment_fn! macro. This lets you write a judgment using inference rules. A “judgment” just means some kind of predicate that the computer can judge to hold or not hold. Inference rules are those rules you may have seen in papers and things:
premise1
premise2
premise3
------------------
conclusion
i.e., the conclusion is judged to be true if all the premises are true.
Judgments in type system papers can look all kinds of ways. For example, a common type system judgment would be the following:
Γ ⊢ E : T
This can be read as, in the environment Γ, the expression E has type T. You might have rule like these:
Γ[X] = ty // lookup variable in the environment
--------------- "variable"
Γ ⊢ X : ty
Γ ⊢ E1 : T // must have the same type
Γ ⊢ E2 : T
--------------- "add"
Γ ⊢ E1 + E2 : T
In a-mir-formality, you might write those rules like so:
#![allow(unused)]
fn main() {
judgment_fn! {
pub fn has_type(
env: Env,
expr: Expr,
) => Type {
(
(env.get(&name) => ty)
---------------
(has_type(env, name: Variable) => ty)
)
(
(has_type(env, left) => ty_left)
(has_type(env, right) => ty_right)
(if ty_left == ty_right)
---------------
(has_type(env, Expr::Add(left, right)) => ty_left)
)
}
}
}
Unlike mathematical papers, where judgments can look like whatever you want, judgments in a-mir-formality always have a fixed form that distinguish inputs and outputs:
judgment_name(input1, input2, input3) => output
In this case, has_type(env, expr) => ty is the equivalent of Γ ⊢ E : T. Note that, by convention, we tend to use more English style names, so env and not Γ, and expr and not E. Of course nothing is stop you from using single letters, it’s just a bit harder to read.
When we write the judgement_fn, it is going to desugar into an actual Rust function that looks like this:
#![allow(unused)]
fn main() {
pub fn has_type(arg0: impl Upcast<Env>, arg1: impl Upcast<Expr>) -> Set<Type> {
let arg0: Env = arg0.upcast();
let arg1: Expr = arg1.upcast();
...
}
}
Some things to note. First, the function arguments (arg0, arg1) implicitly accept anything that “upcasts” (infallibly converts) into the desired types. Upcast is a trait defined within a-mir-formality and implemented by the #[term] macro automatically.
Second, the function always returns a Set. This is because there can be more rules, and they may match in any ways. The generated code is going to exhaustively search to find all the ways that the rules could match. At a high-level the code looks like this (at least if we ignore the possibility of cycles; we’ll come back to that later):
#![allow(unused)]
fn main() {
pub fn has_type(arg0: impl Upcast<Env>, arg1: impl Upcast<Expr>) -> Set<Type> {
let arg0: Env = arg0.upcast();
let arg1: Expr = arg1.upcast();
let mut results = set![]; // we define this macro
if /* inference rule 1 matches */ {
results.push(/* result from inference rule 1 */);
}
if /* inference rule 2 matches */ {
results.push(/* result from inference rule 1 */);
}
// ...
if /* inference rule N matches */ {
results.push(/* result from inference rule N */);
}
results
}
}
So how do we test if a particular inference rule matches? Let’s look more closely at the code we would generate for this inference rule:
#![allow(unused)]
fn main() {
(
(env.get(name) => ty)
---------------
(has_type(env, name: Variable) => ty)
)
}
The first part of the final line, has_type(env, name: Variable), defines patterns that are matched against the arguments. These are matched against the arguments (arg0, arg1) from the judgment. Patterns can either be trivial bindings (like env) or more complex (like name: Variable or Expr::Add(left, right)). In the latter case, they don’t have to match the type of the argument precisely. Instead, we use the Downcast trait combined with pattern matching. So this inference rule would compile to something like…
#![allow(unused)]
fn main() {
// Simple variable bindings just clone...
let env = arg0.clone();
// More complex patterns are downcasted and testing...
if let Some(name) = arg1.downcast::<Variable>() {
... // rule successfully matched! See below.
}
}
Once we’ve matched the arguments, we start trying to execute the inference rule conditions. We have one, env.get(&name) => ty. What does that do? A condition written like $expr => $pat basically becomes a for loop, so you get…
#![allow(unused)]
fn main() {
let env = arg0.clone();
if let Some(name) = arg1.downcast::<Variable>() {
for ty in env.get(&name) {
... // other conditions, if any
}
}
}
Once we run out of conditions, we can generate the final result, which comes from the => $expr in the conclusion of the rule. In this case, something like this:
#![allow(unused)]
fn main() {
let env = arg0.clone();
if let Some(name) = arg1.downcast::<Variable>() {
for ty in env.get(&name) {
result.push(ty);
}
}
}
Thus each inference rule is converted into a little block of code that may push results onto the final set.
The second inference rule (for addition) looks like…
#![allow(unused)]
fn main() {
// given this...
// (
// (has_type(env, left) => ty_left)
// (has_type(env, right) => ty_right)
// (if ty_left == ty_right)
// ---------------
// (has_type(env, Expr::Add(left, right)) => ty_left)
// )
// we generate this...
let env = arg0.clone();
if let Some(Expr::Add(left, right)) = arg1.downcast() {
for ty_left in has_type(env, left) {
for ty_right in has_type(env, right) {
if ty_left == ty_right {
result.push(ty_left);
}
}
}
}
}
If you want to see a real judgement, take a look at the one for proving where clauses.
Handling cycles
Judgment functions must be inductive, which means that cycles are considered failures. We have a tabling implementation, which means we detect cycles and try to handle them intelligently. Basically we track a stack and, if a cycle is detected, we return an empty set of results. But we remember that the cycle happened. Then, once we are done, we’ll have computed some intermediate set of results R[0], and we execute again. This time, when we get the cycle, we return R[0] instead of an empty set. This will compute some new set of results, R[1]. So then we try again. We keep doing this until the new set of results R[i] is equal to the previous set of results R[i-1]. At that point, we have reached a fixed point, so we stop. Of course, it could be that you get an infinitely growing set of results, and execution never terminates. This means your rules are broken. Don’t do that.
FAQ and troubleshooting
Why am I getting errors about undefined references to crate::FormalityLang?
The various derive macros need to know what language you are working in.
To figure this out, they reference crate::FormalityLang, which you must define.
See the chapter on defining your language for more details.