About this guide

This guide is meant to help document how rustc – the Rust compiler – works, as well as to help new contributors get involved in rustc development. It is not meant to replace code documentation – each chapter gives only high-level details – the kinds of things that (ideally) don't change frequently.

The guide itself is of course open-source as well, and the sources can be found at the GitHub repository. If you find any mistakes in the guide, please file an issue about it, or even better, open a PR with a correction!

About the compiler team

rustc is maintained by the Rust compiler team. The people who belong to this team collectively work to track regressions and implement new features. Members of the Rust compiler team are people who have made significant contributions to rustc and its design.

Discussion

Currently the compiler team chats in a number of places. There is an ongoing thread on the internals board about trying to find a permanent home. In any case, you can find people in one of three places at the moment:

Rust compiler meeting

The compiler team has a weekly meeting where we do triage and try to generally stay on top of new bugs, regressions, and other things. This general plan for this meeting can be found in the rust-compiler-meeting etherpad. It works roughly as follows:

  • Review P-high bugs: P-high bugs are those that are sufficiently important for us to actively track progress. P-high bugs should ideally always have an assignee.
  • Look over new regressions: we then look for new cases where the compiler broke previously working code in the wild. Regressions are almost always marked as P-high; the major exception would be bug fixes (though even there we often aim to give warnings first).
  • Check I-nominated issues: These are issues where feedback from the team is desired.
  • Check for beta nominations: These are nominations of things to backport to beta.

The meeting currently takes place on Thursdays at 10am Boston time (UTC-4 typically, but daylight savings time sometimes makes things complicated).

The meeting is held over a "chat medium" — it used to be IRC, but we are currently in the process of evaluating other alternatives. Check the etherpad to find the current home (and see this internals thread for some ongoing discussion).

Team membership

Membership in the Rust team is typically offered when someone has been making significant contributions to the compiler for some time. Membership is both a recognition but also an obligation: compiler team members are generally expected to help with upkeep as well as doing reviews and other work.

If you are interested in becoming a compiler team member, the first thing to do is to start fixing some bugs, or get involved in a working group. One good way to find bugs is to look for open issues tagged with E-easy or E-mentor.

r+ rights

Once you have made a number of individual PRs to rustc, we will often offer r+ privileges. This means that you have the right to instruct "bors" (the robot that manages which PRs get landed into rustc) to merge a PR (here are some instructions for how to talk to bors).

The guidelines for reviewers are as follows:

  • You are always welcome to review any PR, regardless of who it is assigned to. However, do not r+ PRs unless:
    • You are confident in that part of the code.
    • You are confident that nobody else wants to review it first.
      • For example, sometimes people will express a desire to review a PR before it lands, perhaps because it touches a particularly sensitive part of the code.
  • Always be polite when reviewing: you are a representative of the Rust project, so it is expected that you will go above and beyond when it comes to the Code of Conduct.

high-five

Once you have r+ rights, you can also be added to the high-five rotation. high-five is the bot that assigns incoming PRs to reviewers. If you are added, you will be randomly selected to review PRs. If you find you are assigned a PR that you don't feel comfortable reviewing, you can also leave a comment like r? @so-and-so to assign to someone else — if you don't know who to request, just write r? @nikomatsakis for reassignment and @nikomatsakis will pick someone for you.

Getting on the high-five list is much appreciated as it lowers the review burden for all of us! However, if you don't have time to give people timely feedback on their PRs, it may be better that you don't get on the list.

Full team membership

Full team membership is typically extended once someone made many contributions to the Rust compiler over time, ideally (but not necessarily) to multiple areas. Sometimes this might be implementing a new feature, but it is also important — perhaps more important! — to have time and willingness to help out with general upkeep such as bugfixes, tracking regressions, and other less glamorous work.

How to build the compiler and run what you built

The compiler is built using a tool called x.py. You will need to have Python installed to run it. But before we get to that, if you're going to be hacking on rustc, you'll want to tweak the configuration of the compiler. The default configuration is oriented towards running the compiler as a user, not a developer.

Create a config.toml

To start, copy config.toml.example to config.toml:

> cd $RUST_CHECKOUT
> cp config.toml.example config.toml

Then you will want to open up the file and change the following settings (and possibly others, such as llvm.ccache):

[llvm]
# Enables LLVM assertions, which will check that the LLVM bitcode generated
# by the compiler is internally consistent. These are particularly helpful
# if you edit `codegen`.
assertions = true

[rust]
# This enables some assertions, but more importantly it enables the `debug!`
# logging macros that are essential for debugging rustc.
debug-assertions = true

# This will make your build more parallel; it costs a bit of runtime
# performance perhaps (less inlining) but it's worth it.
codegen-units = 0

# I always enable full debuginfo, though debuginfo-lines is more important.
debuginfo = true

# Gives you line numbers for backtraces.
debuginfo-lines = true

What is x.py?

x.py is the script used to orchestrate the tooling in the rustc repository. It is the script that can build docs, run tests, and compile rustc. It is the now preferred way to build rustc and it replaces the old makefiles from before. Below are the different ways to utilize x.py in order to effectively deal with the repo for various common tasks.

Running x.py and building a stage1 compiler

One thing to keep in mind is that rustc is a bootstrapping compiler. That is, since rustc is written in Rust, we need to use an older version of the compiler to compile the newer version. In particular, the newer version of the compiler, libstd, and other tooling may use some unstable features internally. The result is that compiling rustc is done in stages:

  • Stage 0: the stage0 compiler is usually the current beta compiler (x.py will download it for you); you can configure x.py to use something else, though.
  • Stage 1: the code in your clone (for new version) is then compiled with the stage0 compiler to produce the stage1 compiler. However, it was built with an older compiler (stage0), so to optimize the stage1 compiler we go to next stage.
    • (In theory, the stage1 compiler is functionally identical to the stage2 compiler, but in practice there are subtle differences. In particular, the stage1 compiler itself was built by stage0 and hence not by the source in your working directory: this means that the symbol names used in the compiler source may not match the symbol names that would have been made by the stage1 compiler. This can be important when using dynamic linking (e.g., with derives. Sometimes this means that some tests don't work when run with stage1.)
  • Stage 2: we rebuild our stage1 compiler with itself to produce the stage2 compiler (i.e. it builds itself) to have all the latest optimizations. (By default, we copy the stage1 libraries for use by the stage2 compiler, since they ought to be identical.)
  • (Optional) Stage 3: to sanity check of our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.

Build Flags

There are other flags you can pass to the build portion of x.py that can be beneficial to cutting down compile times or fitting other things you might need to change. They are:

Options:
    -v, --verbose       use verbose output (-vv for very verbose)
    -i, --incremental   use incremental compilation
        --config FILE   TOML configuration file for build
        --build BUILD   build target of the stage0 compiler
        --host HOST     host targets to build
        --target TARGET target targets to build
        --on-fail CMD   command to run on failure
        --stage N       stage to build
        --keep-stage N  stage to keep without recompiling
        --src DIR       path to the root of the rust checkout
    -j, --jobs JOBS     number of jobs to run in parallel
    -h, --help          print this help message

For hacking, often building the stage 1 compiler is enough, but for final testing and release, the stage 2 compiler is used.

./x.py check is really fast to build the rust compiler. It is, in particular, very useful when you're doing some kind of "type-based refactoring", like renaming a method, or changing the signature of some function.

Once you've created a config.toml, you are now ready to run x.py. There are a lot of options here, but let's start with what is probably the best "go to" command for building a local rust:

> ./x.py build -i --stage 1 src/libstd

This may look like it only builds libstd, but that is not the case. What this command does is the following:

  • Build libstd using the stage0 compiler (using incremental)
  • Build librustc using the stage0 compiler (using incremental)
    • This produces the stage1 compiler
  • Build libstd using the stage1 compiler (cannot use incremental)

This final product (stage1 compiler + libs built using that compiler) is what you need to build other rust programs.

Note that the command includes the -i switch. This enables incremental compilation. This will be used to speed up the first two steps of the process: in particular, if you make a small change, we ought to be able to use your old results to make producing the stage1 compiler faster.

Unfortunately, incremental cannot be used to speed up making the stage1 libraries. This is because incremental only works when you run the same compiler twice in a row. In this case, we are building a new stage1 compiler every time. Therefore, the old incremental results may not apply. As a result, you will probably find that building the stage1 libstd is a bottleneck for you -- but fear not, there is a (hacky) workaround. See the section on "recommended workflows" below.

Note that this whole command just gives you a subset of the full rustc build. The full rustc build (what you get if you just say ./x.py build) has quite a few more steps:

  • Build librustc and rustc with the stage1 compiler.
    • The resulting compiler here is called the "stage2" compiler.
  • Build libstd with stage2 compiler.
  • Build librustdoc and a bunch of other things with the stage2 compiler.

Build specific components

Build only the libcore library

> ./x.py build src/libcore

Build the libcore and libproc_macro library only

> ./x.py build src/libcore src/libproc_macro

Build only libcore up to Stage 1

> ./x.py build src/libcore --stage 1

Sometimes you might just want to test if the part you’re working on can compile. Using these commands you can test that it compiles before doing a bigger build to make sure it works with the compiler. As shown before you can also pass flags at the end such as --stage.

Creating a rustup toolchain

Once you have successfully built rustc, you will have created a bunch of files in your build directory. In order to actually run the resulting rustc, we recommend creating rustup toolchains. The first one will run the stage1 compiler (which we built above). The second will execute the stage2 compiler (which we did not build, but which you will likely need to build at some point; for example, if you want to run the entire test suite).

> rustup toolchain link stage1 build/<host-triple>/stage1
> rustup toolchain link stage2 build/<host-triple>/stage2

The <host-triple> would typically be one of the following:

  • Linux: x86_64-unknown-linux-gnu
  • Mac: x86_64-apple-darwin
  • Windows: x86_64-pc-windows-msvc

Now you can run the rustc you built with. If you run with -vV, you should see a version number ending in -dev, indicating a build from your local environment:

> rustc +stage1 -vV
rustc 1.25.0-dev
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.25.0-dev
LLVM version: 4.0

Suggested workflows for faster builds of the compiler

There are two workflows that are useful for faster builds of the compiler.

Check, check, and check again. The first workflow, which is useful when doing simple refactorings, is to run ./x.py check continuously. Here you are just checking that the compiler can build, but often that is all you need (e.g., when renaming a method). You can then run ./x.py build when you actually need to run tests.

In fact, it is sometimes useful to put off tests even when you are not 100% sure the code will work. You can then keep building up refactoring commits and only run the tests at some later time. You can then use git bisect to track down precisely which commit caused the problem. A nice side-effect of this style is that you are left with a fairly fine-grained set of commits at the end, all of which build and pass tests. This often helps reviewing.

Incremental builds with --keep-stage. Sometimes just checking whether the compiler builds is not enough. A common example is that you need to add a debug! statement to inspect the value of some state or better understand the problem. In that case, you really need a full build. By leveraging incremental, though, you can often get these builds to complete very fast (e.g., around 30 seconds): the only catch is this requires a bit of fudging and may produce compilers that don't work (but that is easily detected and fixed).

The sequence of commands you want is as follows:

  • Initial build: ./x.py build -i --stage 1 src/libstd
  • Subsequent builds: ./x.py build -i --stage 1 src/libstd --keep-stage 1
    • Note that we added the --keep-stage 1 flag here

The effect of --keep-stage 1 is that we just assume that the old standard library can be re-used. If you are editing the compiler, this is almost always true: you haven't changed the standard library, after all. But sometimes, it's not true: for example, if you are editing the "metadata" part of the compiler, which controls how the compiler encodes types and other states into the rlib files, or if you are editing things that wind up in the metadata (such as the definition of the MIR).

The TL;DR is that you might get weird behavior from a compile when using --keep-stage 1 -- for example, strange ICEs or other panics. In that case, you should simply remove the --keep-stage 1 from the command and rebuild. That ought to fix the problem.

You can also use --keep-stage 1 when running tests. Something like this:

  • Initial test run: ./x.py test -i --stage 1 src/test/ui
  • Subsequent test run: ./x.py test -i --stage 1 src/test/ui --keep-stage 1

Other x.py commands

Here are a few other useful x.py commands. We'll cover some of them in detail in other sections:

  • Building things:
    • ./x.py clean – clean up the build directory (rm -rf build works too, but then you have to rebuild LLVM)
    • ./x.py build --stage 1 – builds everything using the stage 1 compiler, not just up to libstd
    • ./x.py build – builds the stage2 compiler
  • Running tests (see the section on running tests for more details):
    • ./x.py test --stage 1 src/libstd – runs the #[test] tests from libstd
    • ./x.py test --stage 1 src/test/run-pass – runs the run-pass test suite

ctags

One of the challenges with rustc is that the RLS can't handle it, making code navigation difficult. One solution is to use ctags. The following script can be used to set it up: https://github.com/nikomatsakis/rust-etags.

CTAGS integrates into emacs and vim quite easily. The following can then be used to build and generate tags:

$ rust-ctags src/lib* && ./x.py build <something>

This allows you to do "jump-to-def" with whatever functions were around when you last built, which is ridiculously useful.

Cleaning out build directories

Sometimes you need to start fresh, but this is normally not the case. If you need to run this then rustbuild is most likely not acting right and you should file a bug as to what is going wrong. If you do need to clean everything up then you only need to run one command!

> ./x.py clean

Compiler Documentation

The documentation for the rust components are found at rustc doc.

Build distribution artifacts

You might want to build and package up the compiler for distribution. You’ll want to run this command to do it:

./x.py dist

Install distribution artifacts

If you’ve built a distribution artifact you might want to install it and test that it works on your target system. You’ll want to run this command:

./x.py install

Note: If you are testing out a modification to a compiler, you might want to use it to compile some project. Usually, you do not want to use ./x.py install for testing. Rather, you should create a toolchain as discussed in here.

For example, if the toolchain you created is called foo, you would then invoke it with rustc +foo ... (where ... represents the rest of the arguments).

Documenting rustc

You might want to build documentation of the various components available like the standard library. There’s two ways to go about this. You can run rustdoc directly on the file to make sure the HTML is correct, which is fast. Alternatively, you can build the documentation as part of the build process through x.py. Both are viable methods since documentation is more about the content.

Document everything

./x.py doc

If you want to avoid the whole Stage 2 build

./x.py doc --stage 1

First the compiler and rustdoc get built to make sure everything is okay and then it documents the files.

Document specific components

   ./x.py doc src/doc/book
   ./x.py doc src/doc/nomicon
   ./x.py doc src/doc/book src/libstd

Much like individual tests or building certain components you can build only the documentation you want.

Document internal rustc items

Compiler documentation is not built by default. There's a flag in config.toml for achieving the same. But, when enabled, compiler documentation does include internal items.

Next open up config.toml and make sure these two lines are set to true:

docs = true
compiler-docs = true

When you want to build the compiler docs as well run this command:

./x.py doc

This will see that the docs and compiler-docs options are set to true and build the normally hidden compiler docs!

Compiler Documentation

The documentation for the rust components are found at rustc doc.

This file offers some tips on the coding conventions for rustc. This chapter covers formatting, coding for correctness, using crates from crates.io, and some tips on structuring your PR for easy review.

Formatting and the tidy script

rustc is slowly moving towards the Rust standard coding style; at the moment, however, it follows a rather more chaotic style. We do have some mandatory formatting conventions, which are automatically enforced by a script we affectionately call the "tidy" script. The tidy script runs automatically when you do ./x.py test and can be run in isolation with ./x.py test src/tools/tidy.

Copyright notice

Some existing files begin with a copyright and license notice. Please omit this notice for new files licensed under the standard terms (dual MIT/Apache-2.0). For existing files, the year at the top is not meaningful: copyright protections are in fact automatic from the moment of authorship. We do not typically edit the years on existing files.

Line length

Lines should be at most 100 characters. It's even better if you can keep things to 80.

Ignoring the line length limit. Sometimes – in particular for tests – it can be necessary to exempt yourself from this limit. In that case, you can add a comment towards the top of the file (after the copyright notice) like so:


# #![allow(unused_variables)]
#fn main() {
// ignore-tidy-linelength
#}

Tabs vs spaces

Prefer 4-space indent.

Coding for correctness

Beyond formatting, there are a few other tips that are worth following.

Prefer exhaustive matches

Using _ in a match is convenient, but it means that when new variants are added to the enum, they may not get handled correctly. Ask yourself: if a new variant were added to this enum, what's the chance that it would want to use the _ code, versus having some other treatment? Unless the answer is "low", then prefer an exhaustive match. (The same advice applies to if let and while let, which are effectively tests for a single variant.)

Use "TODO" comments for things you don't want to forget

As a useful tool to yourself, you can insert a // TODO comment for something that you want to get back to before you land your PR:

fn do_something() {
    if something_else {
        unimplemented!(); // TODO write this
    }
}

The tidy script will report an error for a // TODO comment, so this code would not be able to land until the TODO is fixed (or removed).

This can also be useful in a PR as a way to signal from one commit that you are leaving a bug that a later commit will fix:

if foo {
    return true; // TODO wrong, but will be fixed in a later commit
}

Using crates from crates.io

It is allowed to use crates from crates.io, though external dependencies should not be added gratuitously. All such crates must have a suitably permissive license. There is an automatic check which inspects the Cargo metadata to ensure this.

How to structure your PR

How you prepare the commits in your PR can make a big difference for the reviewer. Here are some tips.

Isolate "pure refactorings" into their own commit. For example, if you rename a method, then put that rename into its own commit, along with the renames of all the uses.

More commits is usually better. If you are doing a large change, it's almost always better to break it up into smaller steps that can be independently understood. The one thing to be aware of is that if you introduce some code following one strategy, then change it dramatically (versus adding to it) in a later commit, that 'back-and-forth' can be confusing.

If you run rustfmt and the file was not already formatted, isolate that into its own commit. This is really the same as the previous rule, but it's worth highlighting. It's ok to rustfmt files, but since we do not currently run rustfmt all the time, that can introduce a lot of noise into your commit. Please isolate that into its own commit. This also makes rebases a lot less painful, since rustfmt tends to cause a lot of merge conflicts, and having those isolated into their own commit makes them easier to resolve.

No merges. We do not allow merge commits into our history, other than those by bors. If you get a merge conflict, rebase instead via a command like git rebase -i rust-lang/master (presuming you use the name rust-lang for your remote).

Individual commits do not have to build (but it's nice). We do not require that every intermediate commit successfully builds – we only expect to be able to bisect at a PR level. However, if you can make individual commits build, that is always helpful.

Walkthrough: a typical contribution

There are a lot of ways to contribute to the rust compiler, including fixing bugs, improving performance, helping design features, providing feedback on existing features, etc. This chapter does not claim to scratch the surface. Instead, it walks through the design and implementation of a new feature. Not all of the steps and processes described here are needed for every contribution, and I will try to point those out as they arise.

In general, if you are interested in making a contribution and aren't sure where to start, please feel free to ask!

Overview

The feature I will discuss in this chapter is the ? Kleene operator for macros. Basically, we want to be able to write something like this:

macro_rules! foo {
    ($arg:ident $(, $optional_arg:ident)?) => {
        println!("{}", $arg);

        $(
            println!("{}", $optional_arg);
        )?
    }
}

fn main() {
    let x = 0;
    foo!(x); // ok! prints "0"
    foo!(x, x); // ok! prints "0 0"
}

So basically, the $(pat)? matcher in the macro means "this pattern can occur 0 or 1 times", similar to other regex syntaxes.

There were a number of steps to go from an idea to stable rust feature. Here is a quick list. We will go through each of these in order below. As I mentioned before, not all of these are needed for every type of contribution.

  • Idea discussion/Pre-RFC A Pre-RFC is an early draft or design discussion of a feature. This stage is intended to flesh out the design space a bit and get a grasp on the different merits and problems with an idea. It's a great way to get early feedback on your idea before presenting it the wider audience. You can find the original discussion here.
  • RFC This is when you formally present your idea to the community for consideration. You can find the RFC here.
  • Implementation Implement your idea unstabley in the compiler. You can find the original implementation here.
  • Possibly iterate/refine As the community gets experience with your feature on the nightly compiler and in libstd, there may be additional feedback about design choice that might be adjusted. This particular feature went through a number of iterations.
  • Stabilization When your feature has baked enough, a rust team member may propose to stabilize it. If there is consensus, this is done.
  • Relax Your feature is now a stable rust feature!

Pre-RFC and RFC

NOTE: In general, if you are not proposing a new feature or substantial change to rust or the ecosystem, you don't need to follow the RFC process. Instead, you can just jump to implementation.

You can find the official guidelines for when to open an RFC here.

An RFC is a document that describes the feature or change you are proposing in detail. Anyone can write an RFC; the process is the same for everyone, including rust team members.

To open an RFC, open a PR on the rust-lang/rfcs repo on GitHub. You can find detailed instructions in the README.

Before opening an RFC, you should do the research to "flesh out" your idea. Hastily-proposed RFCs tend not to be accepted. You should generally have a good description of the motivation, impact, disadvantages, and potential interactions with other features.

If that sounds like a lot of work, it's because it is. But no fear! Even if you're not a compiler hacker, you can get great feedback by doing a pre-RFC. This is an informal discussion of the idea. The best place to do this is internals.rust-lang.org. Your post doesn't have to follow any particular structure. It doesn't even need to be a cohesive idea. Generally, you will get tons of feedback that you can integrate back to produce a good RFC.

(Another pro-tip: try searching the RFCs repo and internals for prior related ideas. A lot of times an idea has already been considered and was either rejected or postponed to be tried again later. This can save you and everybody else some time)

In the case of our example, a participant in the pre-RFC thread pointed out a syntax ambiguity and a potential resolution. Also, the overall feedback seemed positive. In this case, the discussion converged pretty quickly, but for some ideas, a lot more discussion can happen (e.g. see this RFC which received a whopping 684 comments!). If that happens, don't be discouraged; it means the community is interested in your idea, but it perhaps needs some adjustments.

The RFC for our ? macro feature did receive some discussion on the RFC thread too. As with most RFCs, there were a few questions that we couldn't answer by discussion: we needed experience using the feature to decide. Such questions are listed in the "Unresolved Questions" section of the RFC. Also, over the course of the RFC discussion, you will probably want to update the RFC document itself to reflect the course of the discussion (e.g. new alternatives or prior work may be added or you may decide to change parts of the proposal itself).

In the end, when the discussion seems to reach a consensus and die down a bit, a rust team member may propose to move to FCP with one of three possible dispositions. This means that they want the other members of the appropriate teams to review and comment on the RFC. More discussion may ensue, which may result in more changes or unresolved questions being added. At some point, when everyone is satisfied, the RFC enters the "final comment period" (FCP), which is the last chance for people to bring up objections. When the FCP is over, the disposition is adopted. Here are the three possible dispositions:

  • Merge: accept the feature. Here is the proposal to merge for our ? macro feature.
  • Close: this feature in its current form is not a good fit for rust. Don't be discouraged if this happens to your RFC, and don't take it personally. This is not a reflection on you, but rather a community decision that rust will go a different direction.
  • Postpone: there is interest in going this direction but not at the moment. This happens most often because the appropriate rust team doesn't have the bandwidth to shepherd the feature through the process to stabilization. Often this is the case when the feature doesn't fit into the team's roadmap. Postponed ideas may be revisited later.

When an RFC is merged, the PR is merged into the RFCs repo. A new tracking issue is created in the rust-lang/rust repo to track progress on the feature and discuss unresolved questions, implementation progress and blockers, etc. Here is the tracking issue on for our ? macro feature.

Implementation

To make a change to the compiler, open a PR against the rust-lang/rust repo.

Depending on the feature/change/bug fix/improvement, implementation may be relatively-straightforward or it may be a major undertaking. You can always ask for help or mentorship from more experienced compiler devs. Also, you don't have to be the one to implement your feature; but keep in mind that if you don't it might be a while before someone else does.

For the ? macro feature, I needed to go understand the relevant parts of macro expansion in the compiler. Personally, I find that improving the comments in the code is a helpful way of making sure I understand it, but you don't have to do that if you don't want to.

I then implemented the original feature, as described in the RFC. When a new feature is implemented, it goes behind a feature gate, which means that you have to use #![feature(my_feature_name)] to use the feature. The feature gate is removed when the feature is stabilized.

Most bug fixes and improvements don't require a feature gate. You can just make your changes/improvements.

When you open a PR on the rust-lang/rust, a bot will assign your PR to a review. If there is a particular rust team member you are working with, you can request that reviewer by leaving a comment on the thread with r? @reviewer-github-id (e.g. r? @eddyb). If you don't know who to request, don't request anyone; the bot will assign someone automatically.

The reviewer may request changes before they approve your PR. Feel free to ask questions or discuss things you don't understand or disagree with. However, recognize that the PR won't be merged unless someone on the rust team approves it.

When your review approves the PR, it will go into a queue for yet another bot called @bors. @bors manages the CI build/merge queue. When your PR reaches the head of the @bors queue, @bors will test out the merge by running all tests against your PR on Travis CI. This takes about 2 hours as of this writing. If all tests pass, the PR is merged and becomes part of the next nightly compiler!

There are a couple of things that may happen for some PRs during the review process

  • If the change is substantial enough, the reviewer may request an FCP on the PR. This gives all members of the appropriate team a chance to review the changes.
  • If the change may cause breakage, the reviewer may request a crater run. This compiles the compiler with your changes and then attempts to compile all crates on crates.io with your modified compiler. This is a great smoke test to check if you introduced a change to compiler behavior that affects a large portion of the ecosystem.
  • If the diff of your PR is large or the reviewer is busy, your PR may have some merge conflicts with other PRs that happen to get merged first. You should fix these merge conflicts using the normal git procedures.

If you are not doing a new feature or something like that (e.g. if you are fixing a bug), then that's it! Thanks for your contribution :)

Refining your implementation

As people get experience with your new feature on nightly, slight changes may be proposed and unresolved questions may become resolved. Updates/changes go through the same process for implementing any other changes, as described above (i.e. submit a PR, go through review, wait for @bors, etc).

Some changes may be major enough to require an FCP and some review by rust team members.

For the ? macro feature, we went through a few different iterations after the original implementation: 1, 2, 3.

Along the way, we decided that ? should not take a separator, which was previously an unresolved question listed in the RFC. We also changed the disambiguation strategy: we decided to remove the ability to use ? as a separator token for other repetition operators (e.g. + or *). However, since this was a breaking change, we decided to do it over an edition boundary. Thus, the new feature can be enabled only in edition 2018. These deviations from the original RFC required another FCP.

Stabilization

Finally, after the feature had baked for a while on nightly, a language team member moved to stabilize it.

A stabilization report needs to be written that includes

  • brief description of the behavior and any deviations from the RFC
  • which edition(s) are affected and how
  • links to a few tests to show the interesting aspects

The stabilization report for our feature is here.

After this, a PR is made to remove the feature gate, enabling the feature by default (on the 2018 edition). A note is added to the Release notes about the feature.

TODO: currently, we have a forge article about stabilization, but we really ought to move that to the guide (in fact, we probably should have a whole chapter about feature gates and stabilization).

The compiler testing framework

The Rust project runs a wide variety of different tests, orchestrated by the build system (x.py test). The main test harness for testing the compiler itself is a tool called compiletest (sources in the src/tools/compiletest). This section gives a brief overview of how the testing framework is setup, and then gets into some of the details on how to run tests as well as how to add new tests.

Compiletest test suites

The compiletest tests are located in the tree in the src/test directory. Immediately within you will see a series of subdirectories (e.g. ui, run-make, and so forth). Each of those directories is called a test suite – they house a group of tests that are run in a distinct mode.

Here is a brief summary of the test suites as of this writing and what they mean. In some cases, the test suites are linked to parts of the manual that give more details.

  • ui – tests that check the exact stdout/stderr from compilation and/or running the test
  • run-pass – tests that are expected to compile and execute successfully (no panics)
    • run-pass-valgrind – tests that ought to run with valgrind
  • run-fail – tests that are expected to compile but then panic during execution
  • compile-fail – tests that are expected to fail compilation.
  • parse-fail – tests that are expected to fail to parse
  • pretty – tests targeting the Rust "pretty printer", which generates valid Rust code from the AST
  • debuginfo – tests that run in gdb or lldb and query the debug info
  • codegen – tests that compile and then test the generated LLVM code to make sure that the optimizations we want are taking effect.
  • mir-opt – tests that check parts of the generated MIR to make sure we are building things correctly or doing the optimizations we expect.
  • incremental – tests for incremental compilation, checking that when certain modifications are performed, we are able to reuse the results from previous compilations.
  • run-make – tests that basically just execute a Makefile; the ultimate in flexibility but quite annoying to write.
  • rustdoc – tests for rustdoc, making sure that the generated files contain the expected documentation.
  • *-fulldeps – same as above, but indicates that the test depends on things other than libstd (and hence those things must be built)

Other Tests

The Rust build system handles running tests for various other things, including:

  • Tidy – This is a custom tool used for validating source code style and formatting conventions, such as rejecting long lines. There is more information in the section on coding conventions.

    Example: ./x.py test src/tools/tidy

  • Unit tests – The Rust standard library and many of the Rust packages include typical Rust #[test] unittests. Under the hood, x.py will run cargo test on each package to run all the tests.

    Example: ./x.py test src/libstd

  • Doc tests – Example code embedded within Rust documentation is executed via rustdoc --test. Examples:

    ./x.py test src/doc – Runs rustdoc --test for all documentation in src/doc.

    ./x.py test --doc src/libstd – Runs rustdoc --test on the standard library.

  • Link checker – A small tool for verifying href links within documentation.

    Example: ./x.py test src/tools/linkchecker

  • Dist check – This verifies that the source distribution tarball created by the build system will unpack, build, and run all tests.

    Example: ./x.py test distcheck

  • Tool tests – Packages that are included with Rust have all of their tests run as well (typically by running cargo test within their directory). This includes things such as cargo, clippy, rustfmt, rls, miri, bootstrap (testing the Rust build system itself), etc.

  • Cargo test – This is a small tool which runs cargo test on a few significant projects (such as servo, ripgrep, tokei, etc.) just to ensure there aren't any significant regressions.

    Example: ./x.py test src/tools/cargotest

Testing infrastructure

When a Pull Request is opened on Github, Travis will automatically launch a build that will run all tests on a single configuration (x86-64 linux). In essence, it runs ./x.py test after building.

The integration bot bors is used for coordinating merges to the master branch. When a PR is approved, it goes into a queue where merges are tested one at a time on a wide set of platforms using Travis and Appveyor (currently over 50 different configurations). Most platforms only run the build steps, some run a restricted set of tests, only a subset run the full suite of tests (see Rust's platform tiers).

Testing with Docker images

The Rust tree includes Docker image definitions for the platforms used on Travis in src/ci/docker. The script src/ci/docker/run.sh is used to build the Docker image, run it, build Rust within the image, and run the tests.

TODO: What is a typical workflow for testing/debugging on a platform that you don't have easy access to? Do people build Docker images and enter them to test things out?

Testing on emulators

Some platforms are tested via an emulator for architectures that aren't readily available. There is a set of tools for orchestrating running the tests within the emulator. Platforms such as arm-android and arm-unknown-linux-gnueabihf are set up to automatically run the tests under emulation on Travis. The following will take a look at how a target's tests are run under emulation.

The Docker image for armhf-gnu includes QEMU to emulate the ARM CPU architecture. Included in the Rust tree are the tools remote-test-client and remote-test-server which are programs for sending test programs and libraries to the emulator, and running the tests within the emulator, and reading the results. The Docker image is set up to launch remote-test-server and the build tools use remote-test-client to communicate with the server to coordinate running tests (see src/bootstrap/test.rs).

TODO: What are the steps for manually running tests within an emulator? ./src/ci/docker/run.sh armhf-gnu will do everything, but takes hours to run and doesn't offer much help with interacting within the emulator.

Is there any support for emulating other (non-Android) platforms, such as running on an iOS emulator?

Is there anything else interesting that can be said here about running tests remotely on real hardware?

It's also unclear to me how the wasm or asm.js tests are run.

Crater

Crater is a tool for compiling and running tests for every crate on crates.io (and a few on GitHub). It is mainly used for checking for extent of breakage when implementing potentially breaking changes and ensuring lack of breakage by running beta vs stable compiler versions.

When to run Crater

You should request a crater run if your PR makes large changes to the compiler or could cause breakage. If you are unsure, feel free to ask your PR's reviewer.

Requesting Crater Runs

The rust team maintains a few machines that can be used for running crater runs on the changes introduced by a PR. If your PR needs a crater run, leave a comment for the triage team in the PR thread. Please inform the team whether you require a "check-only" crater run, a "build only" crater run, or a "build-and-test" crater run. The difference is primarily in time; the conservative (if you're not sure) option is to go for the build-and-test run. If making changes that will only have an effect at compile-time (e.g., implementing a new trait) then you only need a check run.

Your PR will be enqueued by the triage team and the results will be posted when they are ready. Check runs will take around ~3-4 days, with the other two taking 5-6 days on average.

While crater is really useful, it is also important to be aware of a few caveats:

  • Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere. Also, companies may not wish to publish their code. Thus, a successful crater run is not a magically green light that there will be no breakage; you still need to be careful.

  • Crater only runs Linux builds on x86_64. Thus, other architectures and platforms are not tested. Critically, this includes Windows.

  • Many crates are not tested. This could be for a lot of reasons, including that the crate doesn't compile any more (e.g. used old nightly features), has broken or flaky tests, requires network access, or other reasons.

  • Before crater can be run, @bors try needs to succeed in building artifacts. This means that if your code doesn't compile, you cannot run crater.

Perf runs

A lot of work is put into improving the performance of the compiler and preventing performance regressions. A "perf run" is used to compare the performance of the compiler in different configurations for a large collection of popular crates. Different configurations include "fresh builds", builds with incremental compilation, etc.

The result of a perf run is a comparison between two versions of the compiler (by their commit hashes).

You should request a perf run if your PR may affect performance, especially if it can affect performance adversely.

Further reading

The following blog posts may also be of interest:

Running tests

You can run the tests using x.py. The most basic command – which you will almost never want to use! – is as follows:

> ./x.py test

This will build the full stage 2 compiler and then run the whole test suite. You probably don't want to do this very often, because it takes a very long time, and anyway bors / travis will do it for you. (Often, I will run this command in the background after opening a PR that I think is done, but rarely otherwise. -nmatsakis)

The test results are cached and previously successful tests are ignored during testing. The stdout/stderr contents as well as a timestamp file for every test can be found under build/ARCH/test/. To force-rerun a test (e.g. in case the test runner fails to notice a change) you can simply remove the timestamp file.

Note that some tests require a Python-enabled gdb. You can test if your gdb install supports Python by using the python command from within gdb. Once invoked you can type some Python code (e.g. print("hi")) followed by return and then CTRL+D to execute it. If you are building gdb from source, you will need to configure with --with-python=<path-to-python-binary>.

Running a subset of the test suites

When working on a specific PR, you will usually want to run a smaller set of tests, and with a stage 1 build. For example, a good "smoke test" that can be used after modifying rustc to see if things are generally working correctly would be the following:

> ./x.py test --stage 1 src/test/{ui,compile-fail,run-pass}

This will run the ui, compile-fail, and run-pass test suites, and only with the stage 1 build. Of course, the choice of test suites is somewhat arbitrary, and may not suit the task you are doing. For example, if you are hacking on debuginfo, you may be better off with the debuginfo test suite:

> ./x.py test --stage 1 src/test/debuginfo

Run only the tidy script

> ./x.py test src/tools/tidy

Run tests on the standard library

> ./x.py test src/libstd

Run tests on the standard library and run the tidy script

> ./x.py test src/libstd src/tools/tidy

Run tests on the standard library using a stage 1 compiler

>   ./x.py test src/libstd --stage 1

By listing which test suites you want to run you avoid having to run tests for components you did not change at all.

Warning: Note that bors only runs the tests with the full stage 2 build; therefore, while the tests usually work fine with stage 1, there are some limitations. In particular, the stage1 compiler doesn't work well with procedural macros or custom derive tests.

Running an individual test

Another common thing that people want to do is to run an individual test, often the test they are trying to fix. One way to do this is to invoke x.py with the --test-args option:

> ./x.py test --stage 1 src/test/ui --test-args issue-1234

Under the hood, the test runner invokes the standard rust test runner (the same one you get with #[test]), so this command would wind up filtering for tests that include "issue-1234" in the name.

Using incremental compilation

You can further enable the --incremental flag to save additional time in subsequent rebuilds:

> ./x.py test --stage 1 src/test/ui --incremental --test-args issue-1234

If you don't want to include the flag with every command, you can enable it in the config.toml, too:

# Whether to always use incremental compilation when building rustc
incremental = true

Note that incremental compilation will use more disk space than usual. If disk space is a concern for you, you might want to check the size of the build directory from time to time.

Running tests manually

Sometimes it's easier and faster to just run the test by hand. Most tests are just rs files, so you can do something like

> rustc +stage1 src/test/ui/issue-1234.rs

This is much faster, but doesn't always work. For example, some tests include directives that specify specific compiler flags, or which rely on other crates, and they may not run the same without those options.

Adding new tests

In general, we expect every PR that fixes a bug in rustc to come accompanied by a regression test of some kind. This test should fail in master but pass after the PR. These tests are really useful for preventing us from repeating the mistakes of the past.

To add a new test, the first thing you generally do is to create a file, typically a Rust source file. Test files have a particular structure:

Depending on the test suite, there may be some other details to be aware of:

What kind of test should I add?

It can be difficult to know what kind of test to use. Here are some rough heuristics:

  • Some tests have specialized needs:
    • need to run gdb or lldb? use the debuginfo test suite
    • need to inspect LLVM IR or MIR IR? use the codegen or mir-opt test suites
    • need to run rustdoc? Prefer a rustdoc test
    • need to inspect the resulting binary in some way? Then use run-make
  • For most other things, a ui (or ui-fulldeps) test is to be preferred:
    • ui tests subsume both run-pass, compile-fail, and parse-fail tests
    • in the case of warnings or errors, ui tests capture the full output, which makes it easier to review but also helps prevent "hidden" regressions in the output

Naming your test

We have not traditionally had a lot of structure in the names of tests. Moreover, for a long time, the rustc test runner did not support subdirectories (it now does), so test suites like src/test/run-pass have a huge mess of files in them. This is not considered an ideal setup.

For regression tests – basically, some random snippet of code that came in from the internet – we often just name the test after the issue. For example, src/test/run-pass/issue-12345.rs. If possible, though, it is better if you can put the test into a directory that helps identify what piece of code is being tested here (e.g., borrowck/issue-12345.rs is much better), or perhaps give it a more meaningful name. Still, do include the issue number somewhere.

When writing a new feature, create a subdirectory to store your tests. For example, if you are implementing RFC 1234 ("Widgets"), then it might make sense to put the tests in directories like:

  • src/test/ui/rfc1234-widgets/
  • src/test/run-pass/rfc1234-widgets/
  • etc

In other cases, there may already be a suitable directory. (The proper directory structure to use is actually an area of active debate.)

Comment explaining what the test is about

When you create a test file, include a comment summarizing the point of the test at the start of the file. This should highlight which parts of the test are more important, and what the bug was that the test is fixing. Citing an issue number is often very helpful.

This comment doesn't have to be super extensive. Just something like "Regression test for #18060: match arms were matching in the wrong order." might already be enough.

These comments are very useful to others later on when your test breaks, since they often can highlight what the problem is. They are also useful if for some reason the tests need to be refactored, since they let others know which parts of the test were important (often a test must be rewritten because it no longer tests what is was meant to test, and then it's useful to know what it was meant to test exactly).

Header commands: configuring rustc

Header commands are special comments that the test runner knows how to interpret. They must appear before the Rust source in the test. They are normally put after the short comment that explains the point of this test. For example, this test uses the // compile-flags command to specify a custom flag to give to rustc when the test is compiled:

// Test the behavior of `0 - 1` when overflow checks are disabled.

// compile-flags: -Coverflow-checks=off

fn main() {
    let x = 0 - 1;
    ...
}

Ignoring tests

These are used to ignore the test in some situations, which means the test won't be compiled or run.

  • ignore-X where X is a target detail or stage will ignore the test accordingly (see below)
  • only-X is like ignore-X, but will only run the test on that target or stage
  • ignore-pretty will not compile the pretty-printed test (this is done to test the pretty-printer, but might not always work)
  • ignore-test always ignores the test
  • ignore-lldb and ignore-gdb will skip a debuginfo test on that debugger.
  • ignore-gdb-version can be used to ignore the test when certain gdb versions are used

Some examples of X in ignore-X:

  • Architecture: aarch64, arm, asmjs, mips, wasm32, x86_64, x86, ...
  • OS: android, emscripten, freebsd, ios, linux, macos, windows, ...
  • Environment (fourth word of the target triple): gnu, msvc, musl.
  • Pointer width: 32bit, 64bit.
  • Stage: stage0, stage1, stage2.

Other Header Commands

Here is a list of other header commands. This list is not exhaustive. Header commands can generally be found by browsing the TestProps structure found in header.rs from the compiletest source.

  • run-rustfix for UI tests, indicates that the test produces structured suggestions. The test writer should create a .fixed file, which contains the source with the suggestions applied. When the test is run, compiletest first checks that the correct lint/warning is generated. Then, it applies the suggestion and compares against .fixed (they must match). Finally, the fixed source is compiled, and this compilation is required to succeed. The .fixed file can also be generated automatically with the --bless option, discussed below.
  • min-gdb-version specifies the minimum gdb version required for this test; see also ignore-gdb-version
  • min-lldb-version specifies the minimum lldb version required for this test
  • rust-lldb causes the lldb part of the test to only be run if the lldb in use contains the Rust plugin
  • no-system-llvm causes the test to be ignored if the system llvm is used
  • min-llvm-version specifies the minimum llvm version required for this test
  • min-system-llvm-version specifies the minimum system llvm version required for this test; the test is ignored if the system llvm is in use and it doesn't meet the minimum version. This is useful when an llvm feature has been backported to rust-llvm
  • ignore-llvm-version can be used to skip the test when certain LLVM versions are used. This takes one or two arguments; the first argument is the first version to ignore. If no second argument is given, all subsequent versions are ignored; otherwise, the second argument is the last version to ignore.
  • compile-pass for UI tests, indicates that the test is supposed to compile, as opposed to the default where the test is supposed to error out.
  • compile-flags passes extra command-line args to the compiler, e.g. compile-flags -g which forces debuginfo to be enabled.
  • should-fail indicates that the test should fail; used for "meta testing", where we test the compiletest program itself to check that it will generate errors in appropriate scenarios. This header is ignored for pretty-printer tests.
  • gate-test-X where X is a feature marks the test as "gate test" for feature X. Such tests are supposed to ensure that the compiler errors when usage of a gated feature is attempted without the proper #![feature(X)] tag. Each unstable lang feature is required to have a gate test.

Error annotations

Error annotations specify the errors that the compiler is expected to emit. They are "attached" to the line in source where the error is located.

  • ~: Associates the following error level and message with the current line
  • ~|: Associates the following error level and message with the same line as the previous comment
  • ~^: Associates the following error level and message with the previous line. Each caret (^) that you add adds a line to this, so ~^^^^^^^ is seven lines up.

The error levels that you can have are:

  1. ERROR
  2. WARNING
  3. NOTE
  4. HELP and SUGGESTION*

* Note: SUGGESTION must follow immediately after HELP.

Revisions

Certain classes of tests support "revisions" (as of the time of this writing, this includes run-pass, compile-fail, run-fail, and incremental, though incremental tests are somewhat different). Revisions allow a single test file to be used for multiple tests. This is done by adding a special header at the top of the file:


# #![allow(unused_variables)]
#fn main() {
// revisions: foo bar baz
#}

This will result in the test being compiled (and tested) three times, once with --cfg foo, once with --cfg bar, and once with --cfg baz. You can therefore use #[cfg(foo)] etc within the test to tweak each of these results.

You can also customize headers and expected error messages to a particular revision. To do this, add [foo] (or bar, baz, etc) after the // comment, like so:


# #![allow(unused_variables)]
#fn main() {
// A flag to pass in only for cfg `foo`:
//[foo]compile-flags: -Z verbose

#[cfg(foo)]
fn test_foo() {
    let x: usize = 32_u32; //[foo]~ ERROR mismatched types
}
#}

Note that not all headers have meaning when customized to a revision. For example, the ignore-test header (and all "ignore" headers) currently only apply to the test as a whole, not to particular revisions. The only headers that are intended to really work when customized to a revision are error patterns and compiler flags.

Guide to the UI tests

The UI tests are intended to capture the compiler's complete output, so that we can test all aspects of the presentation. They work by compiling a file (e.g., ui/hello_world/main.rs), capturing the output, and then applying some normalization (see below). This normalized result is then compared against reference files named ui/hello_world/main.stderr and ui/hello_world/main.stdout. If either of those files doesn't exist, the output must be empty (that is actually the case for this particular test). If the test run fails, we will print out the current output, but it is also saved in build/<target-triple>/test/ui/hello_world/main.stdout (this path is printed as part of the test failure message), so you can run diff and so forth.

Tests that do not result in compile errors

By default, a UI test is expected not to compile (in which case, it should contain at least one //~ ERROR annotation). However, you can also make UI tests where compilation is expected to succeed, and you can even run the resulting program. Just add one of the following header commands:

  • // compile-pass – compilation should succeed but do not run the resulting binary
  • // run-pass – compilation should succeed and we should run the resulting binary

Editing and updating the reference files

If you have changed the compiler's output intentionally, or you are making a new test, you can pass --bless to the test subcommand. E.g. if some tests in src/test/ui are failing, you can run

./x.py test --stage 1 src/test/ui --bless

to automatically adjust the .stderr, .stdout or .fixed files of all tests. Of course you can also target just specific tests with the --test-args your_test_name flag, just like when running the tests.

Normalization

The normalization applied is aimed at eliminating output difference between platforms, mainly about filenames:

  • the test directory is replaced with $DIR
  • all backslashes (\) are converted to forward slashes (/) (for Windows)
  • all CR LF newlines are converted to LF

Sometimes these built-in normalizations are not enough. In such cases, you may provide custom normalization rules using the header commands, e.g.


# #![allow(unused_variables)]
#fn main() {
// normalize-stdout-test: "foo" -> "bar"
// normalize-stderr-32bit: "fn\(\) \(32 bits\)" -> "fn\(\) \($$PTR bits\)"
// normalize-stderr-64bit: "fn\(\) \(64 bits\)" -> "fn\(\) \($$PTR bits\)"
#}

This tells the test, on 32-bit platforms, whenever the compiler writes fn() (32 bits) to stderr, it should be normalized to read fn() ($PTR bits) instead. Similar for 64-bit. The replacement is performed by regexes using default regex flavor provided by regex crate.

The corresponding reference file will use the normalized output to test both 32-bit and 64-bit platforms:

...
   |
   = note: source type: fn() ($PTR bits)
   = note: target type: u16 (16 bits)
...

Please see ui/transmute/main.rs and main.stderr for a concrete usage example.

Besides normalize-stderr-32bit and -64bit, one may use any target information or stage supported by ignore-X here as well (e.g. normalize-stderr-windows or simply normalize-stderr-test for unconditional replacement).

compiletest

Introduction

compiletest is the main test harness of the Rust test suite. It allows test authors to organize large numbers of tests (the Rust compiler has many thousands), efficient test execution (parallel execution is supported), and allows the test author to configure behavior and expected results of both individual and groups of tests.

compiletest tests may check test code for success, for failure or in some cases, even failure to compile. Tests are typically organized as a Rust source file with annotations in comments before and/or within the test code, which serve to direct compiletest on if or how to run the test, what behavior to expect, and more. If you are unfamiliar with the compiler testing framework, see this chapter for additional background.

The tests themselves are typically (but not always) organized into "suites" – for example, run-pass, a folder representing tests that should succeed, run-fail, a folder holding tests that should compile successfully, but return a failure (non-zero status), compile-fail, a folder holding tests that should fail to compile, and many more. The various suites are defined in src/tools/compiletest/src/common.rs in the pub struct Config declaration. And a very good introduction to the different suites of compiler tests along with details about them can be found in Adding new tests.

Adding a new test file

Briefly, simply create your new test in the appropriate location under src/test. No registration of test files is necessary as compiletest will scan the src/test subfolder recursively, and will execute any Rust source files it finds as tests. See Adding new tests for a complete guide on how to adding new tests.

Header Commands

Source file annotations which appear in comments near the top of the source file before any test code are known as header commands. These commands can instruct compiletest to ignore this test, set expectations on whether it is expected to succeed at compiling, or what the test's return code is expected to be. Header commands (and their inline counterparts, Error Info commands) are described more fully here.

Adding a new header command

Header commands are defined in the TestProps struct in src/tools/compiletest/src/header.rs. At a high level, there are dozens of test properties defined here, all set to default values in the TestProp struct's impl block. Any test can override this default value by specifying the property in question as header command as a comment (//) in the test source file, before any source code.

Using a header command

Here is an example, specifying the must-compile-successfully header command, which takes no arguments, followed by the failure-status header command, which takes a single argument (which, in this case is a value of 1). failure-status is instructing compiletest to expect a failure status of 1 (rather than the current Rust default of 101 at the time of this writing). The header command and the argument list (if present) are typically separated by a colon:

// must-compile-successfully
// failure-status: 1

#![feature(termination_trait)]

use std::io::{Error, ErrorKind};

fn main() -> Result<(), Box<Error>> {
    Err(Box::new(Error::new(ErrorKind::Other, "returned Box<Error> from main()")))
}

Adding a new header command property

One would add a new header command if there is a need to define some test property or behavior on an individual, test-by-test basis. A header command property serves as the header command's backing store (holds the command's current value) at runtime.

To add a new header command property: 1. Look for the pub struct TestProps declaration in src/tools/compiletest/src/header.rs and add the new public property to the end of the declaration. 2. Look for the impl TestProps implementation block immediately following the struct declaration and initialize the new property to its default value.

Adding a new header command parser

When compiletest encounters a test file, it parses the file a line at a time by calling every parser defined in the Config struct's implementation block, also in src/tools/compiletest/src/header.rs (note the Config struct's declaration block is found in src/tools/compiletest/src/common.rs. TestProps's load_from() method will try passing the current line of text to each parser, which, in turn typically checks to see if the line begins with a particular commented (//) header command such as // must-compile-successfully or // failure-status. Whitespace after the comment marker is optional.

Parsers will override a given header command property's default value merely by being specified in the test file as a header command or by having a parameter value specified in the test file, depending on the header command.

Parsers defined in impl Config are typically named parse_<header_command> (note kebab-case <header-command> transformed to snake-case <header_command>). impl Config also defines several 'low-level' parsers which make it simple to parse common patterns like simple presence or not (parse_name_directive()), header-command:parameter(s) (parse_name_value_directive()), optional parsing only if a particular cfg attribute is defined (has_cfg_prefix()) and many more. The low-level parsers are found near the end of the impl Config block; be sure to look through them and their associated parsers immediately above to see how they are used to avoid writing additional parsing code unnecessarily.

As a concrete example, here is the implementation for the parse_failure_status() parser, in src/tools/compiletest/src/header.rs:

@@ -232,6 +232,7 @@ pub struct TestProps {
     // customized normalization rules
     pub normalize_stdout: Vec<(String, String)>,
     pub normalize_stderr: Vec<(String, String)>,
+    pub failure_status: i32,
 }

 impl TestProps {
@@ -260,6 +261,7 @@ impl TestProps {
             run_pass: false,
             normalize_stdout: vec![],
             normalize_stderr: vec![],
+            failure_status: 101,
         }
     }

@@ -383,6 +385,10 @@ impl TestProps {
             if let Some(rule) = config.parse_custom_normalization(ln, "normalize-stderr") {
                 self.normalize_stderr.push(rule);
             }
+
+            if let Some(code) = config.parse_failure_status(ln) {
+                self.failure_status = code;
+            }
         });

         for key in &["RUST_TEST_NOCAPTURE", "RUST_TEST_THREADS"] {
@@ -488,6 +494,13 @@ impl Config {
         self.parse_name_directive(line, "pretty-compare-only")
     }

+    fn parse_failure_status(&self, line: &str) -> Option<i32> {
+        match self.parse_name_value_directive(line, "failure-status") {
+            Some(code) => code.trim().parse::<i32>().ok(),
+            _ => None,
+        }
+    }

Implementing the behavior change

When a test invokes a particular header command, it is expected that some behavior will change as a result. What behavior, obviously, will depend on the purpose of the header command. In the case of failure-status, the behavior that changes is that compiletest expects the failure code defined by the header command invoked in the test, rather than the default value.

Although specific to failure-status (as every header command will have a different implementation in order to invoke behavior change) perhaps it is helpful to see the behavior change implementation of one case, simply as an example. To implement failure-status, the check_correct_failure_status() function found in the TestCx implementation block, located in src/tools/compiletest/src/runtest.rs, was modified as per below:

@@ -295,11 +295,14 @@ impl<'test> TestCx<'test> {
     }

     fn check_correct_failure_status(&self, proc_res: &ProcRes) {
-        // The value the rust runtime returns on failure
-        const RUST_ERR: i32 = 101;
-        if proc_res.status.code() != Some(RUST_ERR) {
+        let expected_status = Some(self.props.failure_status);
+        let received_status = proc_res.status.code();
+
+        if expected_status != received_status {
             self.fatal_proc_rec(
-                &format!("failure produced the wrong error: {}", proc_res.status),
+                &format!("Error: expected failure status ({:?}) but received status {:?}.",
+                         expected_status,
+                         received_status),
                 proc_res,
             );
         }
@@ -320,7 +323,6 @@ impl<'test> TestCx<'test> {
         );

         let proc_res = self.exec_compiled_test();
-
         if !proc_res.status.success() {
             self.fatal_proc_rec("test run failed!", &proc_res);
         }
@@ -499,7 +501,6 @@ impl<'test> TestCx<'test> {
                 expected,
                 actual
             );
-            panic!();
         }
     }

Note the use of self.props.failure_status to access the header command property. In tests which do not specify the failure status header command, self.props.failure_status will evaluate to the default value of 101 at the time of this writing. But for a test which specifies a header command of, for example, // failure-status: 1, self.props.failure_status will evaluate to 1, as parse_failure_status() will have overridden the TestProps default value, for that test specifically.

Note: This is copied from the rust-forge. If anything needs updating, please open an issue or make a PR on the github repo.

Debugging the compiler

Here are a few tips to debug the compiler:

Getting a backtrace

When you have an ICE (panic in the compiler), you can set RUST_BACKTRACE=1 to get the stack trace of the panic! like in normal Rust programs. IIRC backtraces don't work on Mac and on MinGW, sorry. If you have trouble or the backtraces are full of unknown, you might want to find some way to use Linux or MSVC on Windows.

In the default configuration, you don't have line numbers enabled, so the backtrace looks like this:

stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::begin_panic
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
  32: rustc_typeck::check_crate
  33: <std::thread::local::LocalKey<T>>::with
  34: <std::thread::local::LocalKey<T>>::with
  35: rustc::ty::context::TyCtxt::create_and_enter
  36: rustc_driver::driver::compile_input
  37: rustc_driver::run_compiler

If you want line numbers for the stack trace, you can enable debuginfo-lines=true or debuginfo=true in your config.toml and rebuild the compiler. Then the backtrace will look like this:

stack backtrace:
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
             at /home/user/rust/src/librustc_typeck/check/cast.rs:110
   7: rustc_typeck::check::cast::CastCheck::check
             at /home/user/rust/src/librustc_typeck/check/cast.rs:572
             at /home/user/rust/src/librustc_typeck/check/cast.rs:460
             at /home/user/rust/src/librustc_typeck/check/cast.rs:370
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
  33: rustc_driver::driver::compile_input
             at /home/user/rust/src/librustc_driver/driver.rs:1010
             at /home/user/rust/src/librustc_driver/driver.rs:212
  34: rustc_driver::run_compiler
             at /home/user/rust/src/librustc_driver/lib.rs:253

Getting a backtrace for errors

If you want to get a backtrace to the point where the compiler emits an error message, you can pass the -Z treat-err-as-bug, which will make the compiler panic on the first error it sees.

This can also help when debugging delay_span_bug calls - it will make the first delay_span_bug call panic, which will give you a useful backtrace.

For example:

$ cat error.rs
fn main() {
    1 + ();
}
$ ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc error.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
 --> error.rs:2:7
  |
2 |     1 + ();
  |       ^ no implementation for `{integer} + ()`
  |
  = help: the trait `std::ops::Add<()>` is not implemented for `{integer}`

error: aborting due to previous error

$ # Now, where does the error above come from?
$ RUST_BACKTRACE=1 \
    ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc \
    error.rs \
    -Z treat-err-as-bug
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
 --> error.rs:2:7
  |
2 |     1 + ();
  |       ^ no implementation for `{integer} + ()`
  |
  = help: the trait `std::ops::Add<()>` is not implemented for `{integer}`

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.24.0-dev running on x86_64-unknown-linux-gnu

note: run with `RUST_BACKTRACE=1` for a backtrace

thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
/home/user/rust/src/librustc_errors/lib.rs:411:12
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
stack backtrace:
  (~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
   7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
             'tcx>>::report_selection_error
             at /home/user/rust/src/librustc/traits/error_reporting.rs:823
   8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
             'tcx>>::report_fulfillment_errors
             at /home/user/rust/src/librustc/traits/error_reporting.rs:160
             at /home/user/rust/src/librustc/traits/error_reporting.rs:112
   9: rustc_typeck::check::FnCtxt::select_obligations_where_possible
             at /home/user/rust/src/librustc_typeck/check/mod.rs:2192
  (~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
  36: rustc_driver::run_compiler
             at /home/user/rust/src/librustc_driver/lib.rs:253
$ # Cool, now I have a backtrace for the error

Getting logging output

The compiler has a lot of debug! calls, which print out logging information at many points. These are very useful to at least narrow down the location of a bug if not to find it entirely, or just to orient yourself as to why the compiler is doing a particular thing.

To see the logs, you need to set the RUST_LOG environment variable to your log filter, e.g. to get the logs for a specific module, you can run the compiler as RUST_LOG=module::path rustc my-file.rs. The Rust logs are powered by env-logger, and you can look at the docs linked there to see the full RUST_LOG syntax. All debug! output will then appear in standard error.

Note that unless you use a very strict filter, the logger will emit a lot of output - so it's typically a good idea to pipe standard error to a file and look at the log output with a text editor.

So to put it together.

# This puts the output of all debug calls in `librustc/traits` into
# standard error, which might fill your console backscroll.
$ RUST_LOG=rustc::traits rustc +local my-file.rs

# This puts the output of all debug calls in `librustc/traits` in
# `traits-log`, so you can then see it with a text editor.
$ RUST_LOG=rustc::traits rustc +local my-file.rs 2>traits-log

# Not recommended. This will show the output of all `debug!` calls
# in the Rust compiler, and there are a *lot* of them, so it will be
# hard to find anything.
$ RUST_LOG=debug rustc +local my-file.rs 2>all-log

# This will show the output of all `info!` calls in `rustc_trans`.
#
# There's an `info!` statement in `trans_instance` that outputs
# every function that is translated. This is useful to find out
# which function triggers an LLVM assertion, and this is an `info!`
# log rather than a `debug!` log so it will work on the official
# compilers.
$ RUST_LOG=rustc_trans=info rustc +local my-file.rs

While calls to info! are included in every build of the compiler, calls to debug! are only included in the program if the debug-assertions=yes is turned on in config.toml (it is turned off by default), so if you don't see DEBUG logs, especially if you run the compiler with RUST_LOG=rustc rustc some.rs and only see INFO logs, make sure that debug-assertions=yes is turned on in your config.toml.

I also think that in some cases just setting it will not trigger a rebuild, so if you changed it and you already have a compiler built, you might want to call x.py clean to force one.

Logging etiquette

Because calls to debug! are removed by default, in most cases, don't worry about adding "unnecessary" calls to debug! and leaving them in code you commit - they won't slow down the performance of what we ship, and if they helped you pinning down a bug, they will probably help someone else with a different one.

However, there are still a few concerns that you might care about:

Expensive operations in logs

A note of caution: the expressions within the debug! call are run whenever RUST_LOG is set, even if the filter would exclude the log. This means that if in the module rustc::foo you have a statement

debug!("{:?}", random_operation(tcx));

Then if someone runs a debug rustc with RUST_LOG=rustc::bar, then random_operation() will still run - even while it's output will never be needed!

This means that you should not put anything too expensive or likely to crash there - that would annoy anyone who wants to use logging for their own module. Note that if RUST_LOG is unset (the default), then the code will not run - this means that if your logging code panics, then no-one will know it until someone tries to use logging to find another bug.

If you need to do an expensive operation in a log, be aware that while log expressions are evaluated even if logging is not enabled in your module, they are not formatted unless it is. This means you can put your expensive/crashy operations inside an fmt::Debug impl, and they will not be run unless your log is enabled:

use std::fmt;

struct ExpensiveOperationContainer<'a, 'gcx, 'tcx>
    where 'tcx: 'gcx, 'a: 'tcx
{
    tcx: TyCtxt<'a, 'gcx, 'tcx>
}

impl<'a, 'gcx, 'tcx> fmt::Debug for ExpensiveOperationContainer<'a, 'gcx, 'tcx> {
    fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
        let value = random_operation(tcx);
        fmt::Debug::fmt(&value, fmt)
    }
}

debug!("{:?}", ExpensiveOperationContainer { tcx });

Formatting Graphviz output (.dot files)

Some compiler options for debugging specific features yield graphviz graphs - e.g. the #[rustc_mir(borrowck_graphviz_postflow="suffix.dot")] attribute dumps various borrow-checker dataflow graphs.

These all produce .dot files. To view these files, install graphviz (e.g. apt-get install graphviz) and then run the following commands:

$ dot -T pdf maybe_init_suffix.dot > maybe_init_suffix.pdf
$ firefox maybe_init_suffix.pdf # Or your favorite pdf viewer

Debugging LLVM

NOTE: If you are looking for info about code generation, please see this chapter instead.

This section is about debugging compiler bugs in code generation (e.g. why the compiler generated some piece of code or crashed in LLVM). LLVM is a big project on its own that probably needs to have its own debugging document (not that I could find one). But here are some tips that are important in a rustc context:

As a general rule, compilers generate lots of information from analyzing code. Thus, a useful first step is usually to find a minimal example. One way to do this is to

  1. create a new crate that reproduces the issue (e.g. adding whatever crate is at fault as a dependency, and using it from there)

  2. minimize the crate by removing external dependencies; that is, moving everything relevant to the new crate

  3. further minimize the issue by making the code shorter (there are tools that help with this like creduce)

The official compilers (including nightlies) have LLVM assertions disabled, which means that LLVM assertion failures can show up as compiler crashes (not ICEs but "real" crashes) and other sorts of weird behavior. If you are encountering these, it is a good idea to try using a compiler with LLVM assertions enabled - either an "alt" nightly or a compiler you build yourself by setting [llvm] assertions=true in your config.toml - and see whether anything turns up.

The rustc build process builds the LLVM tools into ./build/<host-triple>/llvm/bin. They can be called directly.

The default rustc compilation pipeline has multiple codegen units, which is hard to replicate manually and means that LLVM is called multiple times in parallel. If you can get away with it (i.e. if it doesn't make your bug disappear), passing -C codegen-units=1 to rustc will make debugging easier.

To rustc to generate LLVM IR, you need to pass the --emit=llvm-ir flag. If you are building via cargo, use the RUSTFLAGS environment variable (e.g. RUSTFLAGS='--emit=llvm-ir'). This causes rustc to spit out LLVM IR into the target directory.

cargo llvm-ir [options] path spits out the LLVM IR for a particular function at path. (cargo install cargo-asm installs cargo asm and cargo llvm-ir). --build-type=debug emits code for debug builds. There are also other useful options. Also, debug info in LLVM IR can clutter the output a lot: RUSTFLAGS="-C debuginfo=0" is really useful.

RUSTFLAGS="-C save-temps" outputs LLVM bitcode (not the same as IR) at different stages during compilation, which is sometimes useful. One just needs to convert the bitcode files to .ll files using llvm-dis which should be in the target local compilation of rustc.

If you want to play with the optimization pipeline, you can use the opt tool from ./build/<host-triple>/llvm/bin/ with the LLVM IR emitted by rustc. Note that rustc emits different IR depending on whether -O is enabled, even without LLVM's optimizations, so if you want to play with the IR rustc emits, you should:

$ rustc +local my-file.rs --emit=llvm-ir -O -C no-prepopulate-passes \
    -C codegen-units=1
$ OPT=./build/$TRIPLE/llvm/bin/opt
$ $OPT -S -O2 < my-file.ll > my

If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which IR causes an optimization-time assertion to fail, or to see when LLVM performs a particular optimization, you can pass the rustc flag -C llvm-args=-print-after-all, and possibly add -C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME (e.g. -C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\ 7replace17hbe10ea2e7c809b0bE').

That produces a lot of output into standard error, so you'll want to pipe that to some file. Also, if you are using neither -filter-print-funcs nor -C codegen-units=1, then, because the multiple codegen units run in parallel, the printouts will mix together and you won't be able to read anything.

If you want just the IR for a specific function (say, you want to see why it causes an assertion or doesn't optimize correctly), you can use llvm-extract, e.g.

$ ./build/$TRIPLE/llvm/bin/llvm-extract \
    -func='_ZN11collections3str21_$LT$impl$u20$str$GT$7replace17hbe10ea2e7c809b0bE' \
    -S \
    < unextracted.ll \
    > extracted.ll

Filing LLVM bug reports

When filing an LLVM bug report, you will probably want some sort of minimal working example that demonstrates the problem. The Godbolt compiler explorer is really helpful for this.

  1. Once you have some LLVM IR for the problematic code (see above), you can create a minimal working example with Godbolt. Go to gcc.godbolt.org.

  2. Choose LLVM-IR as programming language.

  3. Use llc to compile the IR to a particular target as is:

    • There are some useful flags: -mattr enables target features, -march= selects the target, -mcpu= selects the CPU, etc.
    • Commands like llc -march=help output all architectures available, which is useful because sometimes the Rust arch names and the LLVM names do not match.
    • If you have compiled rustc yourself somewhere, in the target directory you have binaries for llc, opt, etc.
  4. If you want to optimize the LLVM-IR, you can use opt to see how the LLVM optimizations transform it.

  5. Once you have a godbolt link demonstrating the issue, it is pretty easy to fill in an LLVM bug.

Narrowing (Bisecting) Regressions

The cargo-bisect-rustc tool can be used as a quick and easy way to find exactly which PR caused a change in rustc behavior. It automatically downloads rustc PR artifacts and tests them against a project you provide until it finds the regression. You can then look at the PR to get more context on why it was changed. See this tutorial on how to use it.

Profiling the compiler

This discussion talks about how profile the compiler and find out where it spends its time. If you just want to get a general overview, it is often a good idea to just add -Zself-profile option to the rustc command line. This will break down time spent into various categories. But if you want a more detailed look, you probably want to break out a custom profiler.

Profiling with perf

This is a guide for how to profile rustc with perf.

Initial steps

  • Get a clean checkout of rust-lang/master, or whatever it is you want to profile.
  • Set the following settings in your config.toml:
    • debuginfo-lines = true
    • use-jemalloc = false — lets you do memory use profiling with valgrind
    • leave everything else the defaults
  • Run ./x.py build to get a full build
  • Make a rustup toolchain pointing to that result

Gathering a perf profile

perf is an excellent tool on linux that can be used to gather and analyze all kinds of information. Mostly it is used to figure out where a program spends its time. It can also be used for other sorts of events, though, like cache misses and so forth.

The basics

The basic perf command is this:

> perf record -F99 --call-graph dwarf XXX

The -F99 tells perf to sample at 99 Hz, which avoids generating too much data for longer runs (why 99 Hz you ask? It is often chosen because it is unlikely to be in lockstep with other periodic activity). The --call-graph dwarf tells perf to get call-graph information from debuginfo, which is accurate. The XXX is the command you want to profile. So, for example, you might do:

> perf record -F99 --call-graph dwarf cargo +<toolchain> rustc

to run cargo -- here <toolchain> should be the name of the toolchain you made in the beginning. But there are some things to be aware of:

  • You probably don't want to profile the time spend building dependencies. So something like cargo build; cargo clean -p $C may be helpful (where $C is the crate name)
    • Though usually I just do touch src/lib.rs and rebuild instead. =)
  • You probably don't want incremental messing about with your profile. So something like CARGO_INCREMENTAL=0 can be helpful.

Gathering a perf profile from a perf.rust-lang.org test

Often we want to analyze a specific test from perf.rust-lang.org. To do that, the first step is to clone the rustc-perf repository:

> git clone https://github.com/rust-lang-nursery/rustc-perf

Doing it the easy way

Once you've cloned the repo, you can use the collector executable to do profiling for you! You can find instructions in the rustc-perf readme.

For example, to measure the clap-rs test, you might do:

> ./target/release/collector
    --output-repo /path/to/place/output
    profile perf-record
    --rustc /path/to/rustc/executable/from/your/build/directory
    --cargo `which cargo`
    --filter clap-rs
    --builds Check

You can also use that same command to use cachegrind or other profiling tools.

Doing it the hard way

If you prefer to run things manually, that is also possible. You first need to find the source for the test you want. Sources for the tests are found in the collector/benchmarks directory. So let's go into the directory of a specific test; we'll use clap-rs as an example:

> cd collector/benchmarks/clap-rs

In this case, let's say we want to profile the cargo check performance. In that case, I would first run some basic commands to build the dependencies:

# Setup: first clean out any old results and build the dependencies:
> cargo +<toolchain> clean
> CARGO_INCREMENTAL=0 cargo +<toolchain> check

(Again, <toolchain> should be replaced with the name of the toolchain we made in the first step.)

Next: we want record the execution time for just the clap-rs crate, running cargo check. I tend to use cargo rustc for this, since it also allows me to add explicit flags, which we'll do later on.

> touch src/lib.rs
> CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib

Note that final command: it's a doozy! It uses the cargo rustc command, which executes rustc with (potentially) additional options; the --profile check and --lib options specify that we are doing a cargo check execution, and that this is a library (not a binary).

At this point, we can use perf tooling to analyze the results. For example:

> perf report

will open up an interactive TUI program. In simple cases, that can be helpful. For more detailed examination, the perf-focus tool can be helpful; it is covered below.

A note of caution. Each of the rustc-perf tests is its own special snowflake. In particular, some of them are not libraries, in which case you would want to do touch src/main.rs and avoid passing --lib. I'm not sure how best to tell which test is which to be honest.

Gathering NLL data

If you want to profile an NLL run, you can just pass extra options to the cargo rustc command, like so:

> touch src/lib.rs
> CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir

Analyzing a perf profile with perf focus

Once you've gathered a perf profile, we want to get some information about it. For this, I personally use perf focus. It's a kind of simple but useful tool that lets you answer queries like:

  • "how much time was spent in function F" (no matter where it was called from)
  • "how much time was spent in function F when it was called from G"
  • "how much time was spent in function F excluding time spent in G"
  • "what functions does F call and how much time does it spend in them"

To understand how it works, you have to know just a bit about perf. Basically, perf works by sampling your process on a regular basis (or whenever some event occurs). For each sample, perf gathers a backtrace. perf focus lets you write a regular expression that tests which functions appear in that backtrace, and then tells you which percentage of samples had a backtrace that met the regular expression. It's probably easiest to explain by walking through how I would analyze NLL performance.

Installing perf-focus

You can install perf-focus using cargo install:

> cargo install perf-focus

Example: How much time is spent in MIR borrowck?

Let's say we've gathered the NLL data for a test. We'd like to know how much time it is spending in the MIR borrow-checker. The "main" function of the MIR borrowck is called do_mir_borrowck, so we can do this command:

> perf focus '{do_mir_borrowck}'
Matcher    : {do_mir_borrowck}
Matches    : 228
Not Matches: 542
Percentage : 29%

The '{do_mir_borrowck}' argument is called the matcher. It specifies the test to be applied on the backtrace. In this case, the {X} indicates that there must be some function on the backtrace that meets the regular expression X. In this case, that regex is just the name of the function we want (in fact, it's a subset of the name; the full name includes a bunch of other stuff, like the module path). In this mode, perf-focus just prints out the percentage of samples where do_mir_borrowck was on the stack: in this case, 29%.

A note about c++filt. To get the data from perf, perf focus currently executes perf script (perhaps there is a better way...). I've sometimes found that perf script outputs C++ mangled names. This is annoying. You can tell by running perf script | head yourself — if you see names like 5rustc6middle instead of rustc::middle, then you have the same problem. You can solve this by doing:

> perf script | c++filt | perf focus --from-stdin ...

This will pipe the output from perf script through c++filt and should mostly convert those names into a more friendly format. The --from-stdin flag to perf focus tells it to get its data from stdin, rather than executing perf focus. We should make this more convenient (at worst, maybe add a c++filt option to perf focus, or just always use it — it's pretty harmless).

Example: How much time does MIR borrowck spend solving traits?

Perhaps we'd like to know how much time MIR borrowck spends in the trait checker. We can ask this using a more complex regex:

> perf focus '{do_mir_borrowck}..{^rustc::traits}'
Matcher    : {do_mir_borrowck},..{^rustc::traits}
Matches    : 12
Not Matches: 1311
Percentage : 0%

Here we used the .. operator to ask "how often do we have do_mir_borrowck on the stack and then, later, some function whose name begins with rusc::traits?" (basically, code in that module). It turns out the answer is "almost never" — only 12 samples fit that description (if you ever see no samples, that often indicates your query is messed up).

If you're curious, you can find out exactly which samples by using the --print-match option. This will print out the full backtrace for each sample. The | at the front of the line indicates the part that the regular expression matched.

Example: Where does MIR borrowck spend its time?

Often we want to do a more "explorational" queries. Like, we know that MIR borrowck is 29% of the time, but where does that time get spent? For that, the --tree-callees option is often the best tool. You usually also want to give --tree-min-percent or --tree-max-depth. The result looks like this:

> perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 43%

Tree
| matched `{do_mir_borrowck}` (43% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
: : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
: : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'gcx, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
: | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)

What happens with --tree-callees is that

  • we find each sample matching the regular expression
  • we look at the code that is occurs after the regex match and try to build up a call tree

The --tree-min-percent 3 option says "only show me things that take more than 3% of the time. Without this, the tree often gets really noisy and includes random stuff like the innards of malloc. --tree-max-depth can be useful too, it just limits how many levels we print.

For each line, we display the percent of time in that function altogether ("total") and the percent of time spent in just that function and not some callee of that function (self). Usually "total" is the more interesting number, but not always.

Relative percentages

By default, all in perf-focus are relative to the total program execution. This is useful to help you keep perspective — often as we drill down to find hot spots, we can lose sight of the fact that, in terms of overall program execution, this "hot spot" is actually not important. It also ensures that percentages between different queries are easily compared against one another.

That said, sometimes it's useful to get relative percentages, so perf focus offers a --relative option. In this case, the percentages are listed only for samples that match (vs all samples). So for example we could get our percentages relative to the borrowck itself like so:

> perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 100%

Tree
| matched `{do_mir_borrowck}` (100% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
: | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
: | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]

Here you see that compute_regions came up as "47% total" — that means that 47% of do_mir_borrowck is spent in that function. Before, we saw 20% — that's because do_mir_borrowck itself is only 43% of the total time (and .47 * .43 = .20).

High-level overview of the compiler source

Crate structure

The main Rust repository consists of a src directory, under which there live many crates. These crates contain the sources for the standard library and the compiler. This document, of course, focuses on the latter.

Rustc consists of a number of crates, including syntax, rustc, rustc_back, rustc_codegen, rustc_driver, and many more. The source for each crate can be found in a directory like src/libXXX, where XXX is the crate name.

(N.B. The names and divisions of these crates are not set in stone and may change over time. For the time being, we tend towards a finer-grained division to help with compilation time, though as incremental compilation improves, that may change.)

The dependency structure of these crates is roughly a diamond:

                  rustc_driver
                /      |       \
              /        |         \
            /          |           \
          /            v             \
rustc_codegen  rustc_borrowck   ...  rustc_metadata
          \            |            /
            \          |          /
              \        |        /
                \      v      /
                    rustc
                       |
                       v
                    syntax
                    /    \
                  /       \
           syntax_pos  syntax_ext

The rustc_driver crate, at the top of this lattice, is effectively the "main" function for the rust compiler. It doesn't have much "real code", but instead ties together all of the code defined in the other crates and defines the overall flow of execution. (As we transition more and more to the query model, however, the "flow" of compilation is becoming less centrally defined.)

At the other extreme, the rustc crate defines the common and pervasive data structures that all the rest of the compiler uses (e.g. how to represent types, traits, and the program itself). It also contains some amount of the compiler itself, although that is relatively limited.

Finally, all the crates in the bulge in the middle define the bulk of the compiler – they all depend on rustc, so that they can make use of the various types defined there, and they export public routines that rustc_driver will invoke as needed (more and more, what these crates export are "query definitions", but those are covered later on).

Below rustc lie various crates that make up the parser and error reporting mechanism. For historical reasons, these crates do not have the rustc_ prefix, but they are really just as much an internal part of the compiler and not intended to be stable (though they do wind up getting used by some crates in the wild; a practice we hope to gradually phase out).

Each crate has a README.md file that describes, at a high-level, what it contains, and tries to give some kind of explanation (some better than others).

The main stages of compilation

The Rust compiler is in a bit of transition right now. It used to be a purely "pass-based" compiler, where we ran a number of passes over the entire program, and each did a particular check of transformation. We are gradually replacing this pass-based code with an alternative setup based on on-demand queries. In the query-model, we work backwards, executing a query that expresses our ultimate goal (e.g. "compile this crate"). This query in turn may make other queries (e.g. "get me a list of all modules in the crate"). Those queries make other queries that ultimately bottom out in the base operations, like parsing the input, running the type-checker, and so forth. This on-demand model permits us to do exciting things like only do the minimal amount of work needed to type-check a single function. It also helps with incremental compilation. (For details on defining queries, check out the query model.)

Regardless of the general setup, the basic operations that the compiler must perform are the same. The only thing that changes is whether these operations are invoked front-to-back, or on demand. In order to compile a Rust crate, these are the general steps that we take:

  1. Parsing input
    • this processes the .rs files and produces the AST ("abstract syntax tree")
    • the AST is defined in src/libsyntax/ast.rs. It is intended to match the lexical syntax of the Rust language quite closely.
  2. Name resolution, macro expansion, and configuration
    • once parsing is complete, we process the AST recursively, resolving paths and expanding macros. This same process also processes #[cfg] nodes, and hence may strip things out of the AST as well.
  3. Lowering to HIR
    • Once name resolution completes, we convert the AST into the HIR, or "high-level intermediate representation". The HIR is defined in src/librustc/hir/; that module also includes the lowering code.
    • The HIR is a lightly desugared variant of the AST. It is more processed than the AST and more suitable for the analyses that follow. It is not required to match the syntax of the Rust language.
    • As a simple example, in the AST, we preserve the parentheses that the user wrote, so ((1 + 2) + 3) and 1 + 2 + 3 parse into distinct trees, even though they are equivalent. In the HIR, however, parentheses nodes are removed, and those two expressions are represented in the same way.
  4. Type-checking and subsequent analyses
    • An important step in processing the HIR is to perform type checking. This process assigns types to every HIR expression, for example, and also is responsible for resolving some "type-dependent" paths, such as field accesses (x.f – we can't know what field f is being accessed until we know the type of x) and associated type references (T::Item – we can't know what type Item is until we know what T is).
    • Type checking creates "side-tables" (TypeckTables) that include the types of expressions, the way to resolve methods, and so forth.
    • After type-checking, we can do other analyses, such as privacy checking.
  5. Lowering to MIR and post-processing
    • Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which is a very desugared version of Rust, well suited to borrowck but also to certain high-level optimizations.
  6. Translation to LLVM and LLVM optimizations
    • From MIR, we can produce LLVM IR.
    • LLVM then runs its various optimizations, which produces a number of .o files (one for each "codegen unit").
  7. Linking
    • Finally, those .o files are linked together.

The Rustc Driver

The rustc_driver is essentially rustc's main() function. It acts as the glue for running the various phases of the compiler in the correct order, managing state such as the SourceMap (maps AST nodes to source code), Session (general build context and error messaging) and the TyCtxt (the "typing context", allowing you to query the type system and other cool stuff). The rustc_driver crate also provides external users with a method for running code at particular times during the compilation process, allowing third parties to effectively use rustc's internals as a library for analysing a crate or emulating the compiler in-process (e.g. the RLS).

For those using rustc as a library, the run_compiler() function is the main entrypoint to the compiler. Its main parameters are a list of command-line arguments and a reference to something which implements the CompilerCalls trait. A CompilerCalls creates the overall CompileController, letting it govern which compiler passes are run and attach callbacks to be fired at the end of each phase.

From rustc_driver's perspective, the main phases of the compiler are:

  1. Parse Input: Initial crate parsing
  2. Configure and Expand: Resolve #[cfg] attributes, name resolution, and expand macros
  3. Run Analysis Passes: Run trait resolution, typechecking, region checking and other miscellaneous analysis passes on the crate
  4. Translate to LLVM: Translate to the in-memory form of LLVM IR and turn it into an executable/object files

The CompileController then gives users the ability to inspect the ongoing compilation process

  • after parsing
  • after AST expansion
  • after HIR lowering
  • after analysis, and
  • when compilation is done

The CompileState's various state_after_*() constructors can be inspected to determine what bits of information are available to which callback.

For a more detailed explanation on using rustc_driver, check out the stupid-stats guide by @nrc (attached as Appendix A).

Warning: By its very nature, the internal compiler APIs are always going to be unstable. That said, we do try not to break things unnecessarily.

A Note On Lifetimes

The Rust compiler is a fairly large program containing lots of big data structures (e.g. the AST, HIR, and the type system) and as such, arenas and references are heavily relied upon to minimize unnecessary memory use. This manifests itself in the way people can plug into the compiler, preferring a "push"-style API (callbacks) instead of the more Rust-ic "pull" style (think the Iterator trait).

For example the CompileState, the state passed to callbacks after each phase, is essentially just a box of optional references to pieces inside the compiler. The lifetime bound on the CompilerCalls trait then helps to ensure compiler internals don't "escape" the compiler (e.g. if you tried to keep a reference to the AST after the compiler is finished), while still letting users record some state for use after the run_compiler() function finishes.

Thread-local storage and interning are used a lot through the compiler to reduce duplication while also preventing a lot of the ergonomic issues due to many pervasive lifetimes. The rustc::ty::tls module is used to access these thread-locals, although you should rarely need to touch it.

The walking tour of rustdoc

Rustdoc actually uses the rustc internals directly. It lives in-tree with the compiler and standard library. This chapter is about how it works.

Rustdoc is implemented entirely within the crate librustdoc. It runs the compiler up to the point where we have an internal representation of a crate (HIR) and the ability to run some queries about the types of items. HIR and queries are discussed in the linked chapters.

librustdoc performs two major steps after that to render a set of documentation:

  • "Clean" the AST into a form that's more suited to creating documentation (and slightly more resistant to churn in the compiler).
  • Use this cleaned AST to render a crate's documentation, one page at a time.

Naturally, there's more than just this, and those descriptions simplify out lots of details, but that's the high-level overview.

(Side note: librustdoc is a library crate! The rustdoc binary is created using the project in src/tools/rustdoc. Note that literally all that does is call the main() that's in this crate's lib.rs, though.)

Cheat sheet

  • Use ./x.py build --stage 1 src/libstd src/tools/rustdoc to make a usable rustdoc you can run on other projects.
    • Add src/libtest to be able to use rustdoc --test.
    • If you've used rustup toolchain link local /path/to/build/$TARGET/stage1 previously, then after the previous build command, cargo +local doc will Just Work.
  • Use ./x.py doc --stage 1 src/libstd to use this rustdoc to generate the standard library docs.
    • The completed docs will be available in build/$TARGET/doc/std, though the bundle is meant to be used as though you would copy out the doc folder to a web server, since that's where the CSS/JS and landing page are.
  • Most of the HTML printing code is in html/format.rs and html/render.rs. It's in a bunch of fmt::Display implementations and supplementary functions.
  • The types that got Display impls above are defined in clean/mod.rs, right next to the custom Clean trait used to process them out of the rustc HIR.
  • The bits specific to using rustdoc as a test harness are in test.rs.
  • The Markdown renderer is loaded up in html/markdown.rs, including functions for extracting doctests from a given block of Markdown.
  • The tests on rustdoc output are located in src/test/rustdoc, where they're handled by the test runner of rustbuild and the supplementary script src/etc/htmldocck.py.
  • Tests on search index generation are located in src/test/rustdoc-js, as a series of JavaScript files that encode queries on the standard library search index and expected results.

From crate to clean

In core.rs are two central items: the DocContext struct, and the run_core function. The latter is where rustdoc calls out to rustc to compile a crate to the point where rustdoc can take over. The former is a state container used when crawling through a crate to gather its documentation.

The main process of crate crawling is done in clean/mod.rs through several implementations of the Clean trait defined within. This is a conversion trait, which defines one method:

pub trait Clean<T> {
    fn clean(&self, cx: &DocContext) -> T;
}

clean/mod.rs also defines the types for the "cleaned" AST used later on to render documentation pages. Each usually accompanies an implementation of Clean that takes some AST or HIR type from rustc and converts it into the appropriate "cleaned" type. "Big" items like modules or associated items may have some extra processing in its Clean implementation, but for the most part these impls are straightforward conversions. The "entry point" to this module is the impl Clean<Crate> for visit_ast::RustdocVisitor, which is called by run_core above.

You see, I actually lied a little earlier: There's another AST transformation that happens before the events in clean/mod.rs. In visit_ast.rs is the type RustdocVisitor, which actually crawls a hir::Crate to get the first intermediate representation, defined in doctree.rs. This pass is mainly to get a few intermediate wrappers around the HIR types and to process visibility and inlining. This is where #[doc(inline)], #[doc(no_inline)], and #[doc(hidden)] are processed, as well as the logic for whether a pub use should get the full page or a "Reexport" line in the module page.

The other major thing that happens in clean/mod.rs is the collection of doc comments and #[doc=""] attributes into a separate field of the Attributes struct, present on anything that gets hand-written documentation. This makes it easier to collect this documentation later in the process.

The primary output of this process is a clean::Crate with a tree of Items which describe the publicly-documentable items in the target crate.

Hot potato

Before moving on to the next major step, a few important "passes" occur over the documentation. These do things like combine the separate "attributes" into a single string and strip leading whitespace to make the document easier on the markdown parser, or drop items that are not public or deliberately hidden with #[doc(hidden)]. These are all implemented in the passes/ directory, one file per pass. By default, all of these passes are run on a crate, but the ones regarding dropping private/hidden items can be bypassed by passing --document-private-items to rustdoc. Note that unlike the previous set of AST transformations, the passes happen on the cleaned crate.

(Strictly speaking, you can fine-tune the passes run and even add your own, but we're trying to deprecate that. If you need finer-grain control over these passes, please let us know!)

Here is current (as of this writing) list of passes:

  • propagate-doc-cfg - propagates #[doc(cfg(...))] to child items.
  • collapse-docs concatenates all document attributes into one document attribute. This is necessary because each line of a doc comment is given as a separate doc attribute, and this will combine them into a single string with line breaks between each attribute.
  • unindent-comments removes excess indentation on comments in order for markdown to like it. This is necessary because the convention for writing documentation is to provide a space between the /// or //! marker and the text, and stripping that leading space will make the text easier to parse by the Markdown parser. (In the past, the markdown parser used was not Commonmark- compliant, which caused annoyances with extra whitespace but this seems to be less of an issue today.)
  • strip-priv-imports strips all private import statements (use, extern crate) from a crate. This is necessary because rustdoc will handle public imports by either inlining the item's documentation to the module or creating a "Reexports" section with the import in it. The pass ensures that all of these imports are actually relevant to documentation.
  • strip-hidden and strip-private strip all doc(hidden) and private items from the output. strip-private implies strip-priv-imports. Basically, the goal is to remove items that are not relevant for public documentation.

From clean to crate

This is where the "second phase" in rustdoc begins. This phase primarily lives in the html/ folder, and it all starts with run() in html/render.rs. This code is responsible for setting up the Context, SharedContext, and Cache which are used during rendering, copying out the static files which live in every rendered set of documentation (things like the fonts, CSS, and JavaScript that live in html/static/), creating the search index, and printing out the source code rendering, before beginning the process of rendering all the documentation for the crate.

Several functions implemented directly on Context take the clean::Crate and set up some state between rendering items or recursing on a module's child items. From here the "page rendering" begins, via an enormous write!() call in html/layout.rs. The parts that actually generate HTML from the items and documentation occurs within a series of std::fmt::Display implementations and functions that pass around a &mut std::fmt::Formatter. The top-level implementation that writes out the page body is the impl<'a> fmt::Display for Item<'a> in html/render.rs, which switches out to one of several item_* functions based on the kind of Item being rendered.

Depending on what kind of rendering code you're looking for, you'll probably find it either in html/render.rs for major items like "what sections should I print for a struct page" or html/format.rs for smaller component pieces like "how should I print a where clause as part of some other item".

Whenever rustdoc comes across an item that should print hand-written documentation alongside, it calls out to html/markdown.rs which interfaces with the Markdown parser. This is exposed as a series of types that wrap a string of Markdown, and implement fmt::Display to emit HTML text. It takes special care to enable certain features like footnotes and tables and add syntax highlighting to Rust code blocks (via html/highlight.rs) before running the Markdown parser. There's also a function in here (find_testable_code) that specifically scans for Rust code blocks so the test-runner code can find all the doctests in the crate.

From soup to nuts

(alternate title: "An unbroken thread that stretches from those first Cells to us")

It's important to note that the AST cleaning can ask the compiler for information (crucially, DocContext contains a TyCtxt), but page rendering cannot. The clean::Crate created within run_core is passed outside the compiler context before being handed to html::render::run. This means that a lot of the "supplementary data" that isn't immediately available inside an item's definition, like which trait is the Deref trait used by the language, needs to be collected during cleaning, stored in the DocContext, and passed along to the SharedContext during HTML rendering. This manifests as a bunch of shared state, context variables, and RefCells.

Also of note is that some items that come from "asking the compiler" don't go directly into the DocContext - for example, when loading items from a foreign crate, rustdoc will ask about trait implementations and generate new Items for the impls based on that information. This goes directly into the returned Crate rather than roundabout through the DocContext. This way, these implementations can be collected alongside the others, right before rendering the HTML.

Other tricks up its sleeve

All this describes the process for generating HTML documentation from a Rust crate, but there are couple other major modes that rustdoc runs in. It can also be run on a standalone Markdown file, or it can run doctests on Rust code or standalone Markdown files. For the former, it shortcuts straight to html/markdown.rs, optionally including a mode which inserts a Table of Contents to the output HTML.

For the latter, rustdoc runs a similar partial-compilation to get relevant documentation in test.rs, but instead of going through the full clean and render process, it runs a much simpler crate walk to grab just the hand-written documentation. Combined with the aforementioned "find_testable_code" in html/markdown.rs, it builds up a collection of tests to run before handing them off to the libtest test runner. One notable location in test.rs is the function make_test, which is where hand-written doctests get transformed into something that can be executed.

Some extra reading about make_test can be found here.

Dotting i's and crossing t's

So that's rustdoc's code in a nutshell, but there's more things in the repo that deal with it. Since we have the full compiletest suite at hand, there's a set of tests in src/test/rustdoc that make sure the final HTML is what we expect in various situations. These tests also use a supplementary script, src/etc/htmldocck.py, that allows it to look through the final HTML using XPath notation to get a precise look at the output. The full description of all the commands available to rustdoc tests is in htmldocck.py.

In addition, there are separate tests for the search index and rustdoc's ability to query it. The files in src/test/rustdoc-js each contain a different search query and the expected results, broken out by search tab. These files are processed by a script in src/tools/rustdoc-js and the Node.js runtime. These tests don't have as thorough of a writeup, but a broad example that features results in all tabs can be found in basic.js. The basic idea is that you match a given QUERY with a set of EXPECTED results, complete with the full item path of each item.

Queries: demand-driven compilation

As described in the high-level overview of the compiler, the Rust compiler is current transitioning from a traditional "pass-based" setup to a "demand-driven" system. The Compiler Query System is the key to our new demand-driven organization. The idea is pretty simple. You have various queries that compute things about the input – for example, there is a query called type_of(def_id) that, given the def-id of some item, will compute the type of that item and return it to you.

Query execution is memoized – so the first time you invoke a query, it will go do the computation, but the next time, the result is returned from a hashtable. Moreover, query execution fits nicely into incremental computation; the idea is roughly that, when you do a query, the result may be returned to you by loading stored data from disk (but that's a separate topic we won't discuss further here).

The overall vision is that, eventually, the entire compiler control-flow will be query driven. There will effectively be one top-level query ("compile") that will run compilation on a crate; this will in turn demand information about that crate, starting from the end. For example:

  • This "compile" query might demand to get a list of codegen-units (i.e. modules that need to be compiled by LLVM).
  • But computing the list of codegen-units would invoke some subquery that returns the list of all modules defined in the Rust source.
  • That query in turn would invoke something asking for the HIR.
  • This keeps going further and further back until we wind up doing the actual parsing.

However, that vision is not fully realized. Still, big chunks of the compiler (for example, generating MIR) work exactly like this.

Invoking queries

To invoke a query is simple. The tcx ("type context") offers a method for each defined query. So, for example, to invoke the type_of query, you would just do this:

let ty = tcx.type_of(some_def_id);

Cycles between queries

A cycle is when a query becomes stuck in a loop e.g. query A generates query B which generates query A again.

Currently, cycles during query execution should always result in a compilation error. Typically, they arise because of illegal programs that contain cyclic references they shouldn't (though sometimes they arise because of compiler bugs, in which case we need to factor our queries in a more fine-grained fashion to avoid them).

However, it is nonetheless often useful to recover from a cycle (after reporting an error, say) and try to soldier on, so as to give a better user experience. In order to recover from a cycle, you don't get to use the nice method-call-style syntax. Instead, you invoke using the try_get method, which looks roughly like this:

use ty::queries;
...
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
  Ok(result) => {
    // no cycle occurred! You can use `result`
  }
  Err(err) => {
    // A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
    // meaning essentially an "in-progress", not-yet-reported error message.
    // See below for more details on what to do here.
  }
}

So, if you get back an Err from try_get, then a cycle did occur. This means that you must ensure that a compiler error message is reported. You can do that in two ways:

The simplest is to invoke err.emit(). This will emit the cycle error to the user.

However, often cycles happen because of an illegal program, and you know at that point that an error either already has been reported or will be reported due to this cycle by some other bit of code. In that case, you can invoke err.cancel() to not emit any error. It is traditional to then invoke:

tcx.sess.delay_span_bug(some_span, "some message")

delay_span_bug() is a helper that says: we expect a compilation error to have happened or to happen in the future; so, if compilation ultimately succeeds, make an ICE with the message "some message". This is basically just a precaution in case you are wrong.

How the compiler executes a query

So you may be wondering what happens when you invoke a query method. The answer is that, for each query, the compiler maintains a cache – if your query has already been executed, then, the answer is simple: we clone the return value out of the cache and return it (therefore, you should try to ensure that the return types of queries are cheaply cloneable; insert a Rc if necessary).

Providers

If, however, the query is not in the cache, then the compiler will try to find a suitable provider. A provider is a function that has been defined and linked into the compiler somewhere that contains the code to compute the result of the query.

Providers are defined per-crate. The compiler maintains, internally, a table of providers for every crate, at least conceptually. Right now, there are really two sets: the providers for queries about the local crate (that is, the one being compiled) and providers for queries about external crates (that is, dependencies of the local crate). Note that what determines the crate that a query is targeting is not the kind of query, but the key. For example, when you invoke tcx.type_of(def_id), that could be a local query or an external query, depending on what crate the def_id is referring to (see the self::keys::Key trait for more information on how that works).

Providers always have the same signature:

fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
                       key: QUERY_KEY)
                       -> QUERY_RESULT
{
    ...
}

Providers take two arguments: the tcx and the query key. Note also that they take the global tcx (i.e. they use the 'tcx lifetime twice), rather than taking a tcx with some active inference context. They return the result of the query.

How providers are setup

When the tcx is created, it is given the providers by its creator using the Providers struct. This struct is generated by the macros here, but it is basically a big list of function pointers:

struct Providers {
    type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
    ...
}

At present, we have one copy of the struct for local crates, and one for external crates, though the plan is that we may eventually have one per crate.

These Provider structs are ultimately created and populated by librustc_driver, but it does this by distributing the work throughout the other rustc_* crates. This is done by invoking various provide functions. These functions tend to look something like this:

pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        ..*providers
    };
}

That is, they take an &mut Providers and mutate it in place. Usually we use the formulation above just because it looks nice, but you could as well do providers.type_of = type_of, which would be equivalent. (Here, type_of would be a top-level function, defined as we saw before.) So, if we want to add a provider for some other query, let's call it fubar, into the crate above, we might modify the provide() function like so:

pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        fubar,
        ..*providers
    };
}

fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { ... }

N.B. Most of the rustc_* crates only provide local providers. Almost all extern providers wind up going through the rustc_metadata crate, which loads the information from the crate metadata. But in some cases there are crates that provide queries for both local and external crates, in which case they define both a provide and a provide_extern function that rustc_driver can invoke.

Adding a new kind of query

So suppose you want to add a new kind of query, how do you do so? Well, defining a query takes place in two steps:

  1. first, you have to specify the query name and arguments; and then,
  2. you have to supply query providers where needed.

To specify the query name and arguments, you simply add an entry to the big macro invocation in src/librustc/ty/query/mod.rs, which looks something like:

define_queries! { <'tcx>
    /// Records the type of every item.
    [] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,

    ...
}

Each line of the macro defines one query. The name is broken up like this:

[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
^^    ^^^^^^^  ^^^^^^^^^^ ^^^^^     ^^^^^^^^
|     |        |          |         |
|     |        |          |         result type of query
|     |        |          query key type
|     |        dep-node constructor
|     name of query
query flags

Let's go over them one by one:

  • Query flags: these are largely unused right now, but the intention is that we'll be able to customize various aspects of how the query is processed.
  • Name of query: the name of the query method (tcx.type_of(..)). Also used as the name of a struct (ty::queries::type_of) that will be generated to represent this query.
  • Dep-node constructor: indicates the constructor function that connects this query to incremental compilation. Typically, this is a DepNode variant, which can be added by modifying the define_dep_nodes! macro invocation in librustc/dep_graph/dep_node.rs.
    • However, sometimes we use a custom function, in which case the name will be in snake case and the function will be defined at the bottom of the file. This is typically used when the query key is not a def-id, or just not the type that the dep-node expects.
  • Query key type: the type of the argument to this query. This type must implement the ty::query::keys::Key trait, which defines (for example) how to map it to a crate, and so forth.
  • Result type of query: the type produced by this query. This type should (a) not use RefCell or other interior mutability and (b) be cheaply cloneable. Interning or using Rc or Arc is recommended for non-trivial data types.
    • The one exception to those rules is the ty::steal::Steal type, which is used to cheaply modify MIR in place. See the definition of Steal for more details. New uses of Steal should not be added without alerting @rust-lang/compiler.

So, to add a query:

  • Add an entry to define_queries! using the format above.
  • Possibly add a corresponding entry to the dep-node macro.
  • Link the provider by modifying the appropriate provide method; or add a new one if needed and ensure that rustc_driver is invoking it.

Query structs and descriptions

For each kind, the define_queries macro will generate a "query struct" named after the query. This struct is a kind of a place-holder describing the query. Each such struct implements the self::config::QueryConfig trait, which has associated types for the key/value of that particular query. Basically the code generated looks something like this:

// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }

impl<'tcx> QueryConfig for type_of<'tcx> {
  type Key = DefId;
  type Value = Ty<'tcx>;
}

There is an additional trait that you may wish to implement called self::config::QueryDescription. This trait is used during cycle errors to give a "human readable" name for the query, so that we can summarize what was happening when the cycle occurred. Implementing this trait is optional if the query key is DefId, but if you don't implement it, you get a pretty generic error ("processing foo..."). You can put new impls into the config module. They look something like this:

impl<'tcx> QueryDescription for queries::type_of<'tcx> {
    fn describe(tcx: TyCtxt, key: DefId) -> String {
        format!("computing the type of `{}`", tcx.item_path_str(key))
    }
}

Incremental compilation

The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. We'll start by describing a slightly simplified variant of the real thing – the "basic algorithm" – and then describe some possible improvements.

The basic algorithm

The basic algorithm is called the red-green algorithm1. The high-level idea is that, after each run of the compiler, we will save the results of all the queries that we do, as well as the query DAG. The query DAG is a DAG that indexes which queries executed which other queries. So, for example, there would be an edge from a query Q1 to another query Q2 if computing Q1 required computing Q2 (note that because queries cannot depend on themselves, this results in a DAG and not a general graph).

On the next run of the compiler, then, we can sometimes reuse these query results to avoid re-executing a query. We do this by assigning every query a color:

  • If a query is colored red, that means that its result during this compilation has changed from the previous compilation.
  • If a query is colored green, that means that its result is the same as the previous compilation.

There are two key insights here:

  • First, if all the inputs to query Q are colored green, then the query Q must result in the same value as last time and hence need not be re-executed (or else the compiler is not deterministic).
  • Second, even if some inputs to a query changes, it may be that it still produces the same result as the previous compilation. In particular, the query may only use part of its input.
    • Therefore, after executing a query, we always check whether it produced the same result as the previous time. If it did, we can still mark the query as green, and hence avoid re-executing dependent queries.

The try-mark-green algorithm

At the core of incremental compilation is an algorithm called "try-mark-green". It has the job of determining the color of a given query Q (which must not have yet been executed). In cases where Q has red inputs, determining Q's color may involve re-executing Q so that we can compare its output, but if all of Q's inputs are green, then we can conclude that Q must be green without re-executing it or inspecting its value at all. In the compiler, this allows us to avoid deserializing the result from disk when we don't need it, and in fact enables us to sometimes skip serializing the result as well (see the refinements section below).

Try-mark-green works as follows:

  • First check if the query Q was executed during the previous compilation.
    • If not, we can just re-execute the query as normal, and assign it the color of red.
  • If yes, then load the 'dependent queries' of Q.
  • If there is a saved result, then we load the reads(Q) vector from the query DAG. The "reads" is the set of queries that Q executed during its execution.
    • For each query R in reads(Q), we recursively demand the color of R using try-mark-green.
      • Note: it is important that we visit each node in reads(Q) in same order as they occurred in the original compilation. See the section on the query DAG below.
      • If any of the nodes in reads(Q) wind up colored red, then Q is dirty.
        • We re-execute Q and compare the hash of its result to the hash of the result from the previous compilation.
        • If the hash has not changed, we can mark Q as green and return.
      • Otherwise, all of the nodes in reads(Q) must be green. In that case, we can color Q as green and return.

The query DAG

The query DAG code is stored in src/librustc/dep_graph. Construction of the DAG is done by instrumenting the query execution.

One key point is that the query DAG also tracks ordering; that is, for each query Q, we not only track the queries that Q reads, we track the order in which they were read. This allows try-mark-green to walk those queries back in the same order. This is important because once a subquery comes back as red, we can no longer be sure that Q will continue along the same path as before. That is, imagine a query like this:

fn main_query(tcx) {
    if tcx.subquery1() {
        tcx.subquery2()
    } else {
        tcx.subquery3()
    }
}

Now imagine that in the first compilation, main_query starts by executing subquery1, and this returns true. In that case, the next query main_query executes will be subquery2, and subquery3 will not be executed at all.

But now imagine that in the next compilation, the input has changed such that subquery1 returns false. In this case, subquery2 would never execute. If try-mark-green were to visit reads(main_query) out of order, however, it might visit subquery2 before subquery1, and hence execute it. This can lead to ICEs and other problems in the compiler.

Improvements to the basic algorithm

In the description of the basic algorithm, we said that at the end of compilation we would save the results of all the queries that were performed. In practice, this can be quite wasteful – many of those results are very cheap to recompute, and serializing and deserializing them is not a particular win. In practice, what we would do is to save the hashes of all the subqueries that we performed. Then, in select cases, we also save the results.

This is why the incremental algorithm separates computing the color of a node, which often does not require its value, from computing the result of a node. Computing the result is done via a simple algorithm like so:

  • Check if a saved result for Q is available. If so, compute the color of Q. If Q is green, deserialize and return the saved result.
  • Otherwise, execute Q.
    • We can then compare the hash of the result and color Q as green if it did not change.

Footnotes

1

I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis

Debugging and Testing Dependencies

Testing the dependency graph

There are various ways to write tests against the dependency graph. The simplest mechanisms are the #[rustc_if_this_changed] and #[rustc_then_this_would_need] annotations. These are used in compile-fail tests to test whether the expected set of paths exist in the dependency graph. As an example, see src/test/compile-fail/dep-graph-caller-callee.rs.

The idea is that you can annotate a test like:

#[rustc_if_this_changed]
fn foo() { }

#[rustc_then_this_would_need(TypeckTables)] //~ ERROR OK
fn bar() { foo(); }

#[rustc_then_this_would_need(TypeckTables)] //~ ERROR no path
fn baz() { }

This will check whether there is a path in the dependency graph from Hir(foo) to TypeckTables(bar). An error is reported for each #[rustc_then_this_would_need] annotation that indicates whether a path exists. //~ ERROR annotations can then be used to test if a path is found (as demonstrated above).

Debugging the dependency graph

Dumping the graph

The compiler is also capable of dumping the dependency graph for your debugging pleasure. To do so, pass the -Z dump-dep-graph flag. The graph will be dumped to dep_graph.{txt,dot} in the current directory. You can override the filename with the RUST_DEP_GRAPH environment variable.

Frequently, though, the full dep graph is quite overwhelming and not particularly helpful. Therefore, the compiler also allows you to filter the graph. You can filter in three ways:

  1. All edges originating in a particular set of nodes (usually a single node).
  2. All edges reaching a particular set of nodes.
  3. All edges that lie between given start and end nodes.

To filter, use the RUST_DEP_GRAPH_FILTER environment variable, which should look like one of the following:

source_filter     // nodes originating from source_filter
-> target_filter  // nodes that can reach target_filter
source_filter -> target_filter // nodes in between source_filter and target_filter

source_filter and target_filter are a &-separated list of strings. A node is considered to match a filter if all of those strings appear in its label. So, for example:

RUST_DEP_GRAPH_FILTER='-> TypeckTables'

would select the predecessors of all TypeckTables nodes. Usually though you want the TypeckTables node for some particular fn, so you might write:

RUST_DEP_GRAPH_FILTER='-> TypeckTables & bar'

This will select only the predecessors of TypeckTables nodes for functions with bar in their name.

Perhaps you are finding that when you change foo you need to re-type-check bar, but you don't think you should have to. In that case, you might do:

RUST_DEP_GRAPH_FILTER='Hir & foo -> TypeckTables & bar'

This will dump out all the nodes that lead from Hir(foo) to TypeckTables(bar), from which you can (hopefully) see the source of the erroneous edge.

Tracking down incorrect edges

Sometimes, after you dump the dependency graph, you will find some path that should not exist, but you will not be quite sure how it came to be. When the compiler is built with debug assertions, it can help you track that down. Simply set the RUST_FORBID_DEP_GRAPH_EDGE environment variable to a filter. Every edge created in the dep-graph will be tested against that filter – if it matches, a bug! is reported, so you can easily see the backtrace (RUST_BACKTRACE=1).

The syntax for these filters is the same as described in the previous section. However, note that this filter is applied to every edge and doesn't handle longer paths in the graph, unlike the previous section.

Example:

You find that there is a path from the Hir of foo to the type check of bar and you don't think there should be. You dump the dep-graph as described in the previous section and open dep-graph.txt to see something like:

Hir(foo) -> Collect(bar)
Collect(bar) -> TypeckTables(bar)

That first edge looks suspicious to you. So you set RUST_FORBID_DEP_GRAPH_EDGE to Hir&foo -> Collect&bar, re-run, and then observe the backtrace. Voila, bug fixed!

The Parser

The parser is responsible for converting raw Rust source code into a structured form which is easier for the compiler to work with, usually called an Abstract Syntax Tree. An AST mirrors the structure of a Rust program in memory, using a Span to link a particular AST node back to its source text.

The bulk of the parser lives in the libsyntax crate.

Like most parsers, the parsing process is composed of two main steps,

  • lexical analysis – turn a stream of characters into a stream of token trees
  • parsing – turn the token trees into an AST

The syntax crate contains several main players,

  • a SourceMap for mapping AST nodes to their source code
  • the ast module contains types corresponding to each AST node
  • a StringReader for lexing source code into tokens
  • the parser module and Parser struct are in charge of actually parsing tokens into AST nodes,
  • and a visit module for walking the AST and inspecting or mutating the AST nodes.

The main entrypoint to the parser is via the various parse_* functions in the parser module. They let you do things like turn a SourceFile (e.g. the source in a single file) into a token stream, create a parser from the token stream, and then execute the parser to get a Crate (the root AST node).

To minimise the amount of copying that is done, both the StringReader and Parser have lifetimes which bind them to the parent ParseSess. This contains all the information needed while parsing, as well as the SourceMap itself.

The #[test] attribute

Today, rust programmers rely on a built in attribute called #[test]. All you have to do is mark a function as a test and include some asserts like so:

#[test]
fn my_test() {
  assert!(2+2 == 4);
}

When this program is compiled using rustc --test or cargo test, it will produce an executable that can run this, and any other test function. This method of testing allows tests to live alongside code in an organic way. You can even put tests inside private modules:

mod my_priv_mod {
  fn my_priv_func() -> bool {}

  #[test]
  fn test_priv_func() {
    assert!(my_priv_func());
  }
}

Private items can thus be easily tested without worrying about how to expose the them to any sort of external testing apparatus. This is key to the ergonomics of testing in Rust. Semantically, however, it's rather odd. How does any sort of main function invoke these tests if they're not visible? What exactly is rustc --test doing?

#[test] is implemented as a syntactic transformation inside the compiler's libsyntax crate. Essentially, it's a fancy macro, that rewrites the crate in 3 steps:

Step 1: Re-Exporting

As mentioned earlier, tests can exist inside private modules, so we need a way of exposing them to the main function, without breaking any existing code. To that end, libsyntax will create local modules called __test_reexports that recursively reexport tests. This expansion translates the above example into:

mod my_priv_mod {
  fn my_priv_func() -> bool {}

  pub fn test_priv_func() {
    assert!(my_priv_func());
  }

  pub mod __test_reexports {
    pub use super::test_priv_func;
  }
}

Now, our test can be accessed as my_priv_mod::__test_reexports::test_priv_func. For deeper module structures, __test_reexports will reexport modules that contain tests, so a test at a::b::my_test becomes a::__test_reexports::b::__test_reexports::my_test. While this process seems pretty safe, what happens if there is an existing __test_reexports module? The answer: nothing.

To explain, we need to understand how the AST represents identifiers. The name of every function, variable, module, etc. is not stored as a string, but rather as an opaque Symbol which is essentially an ID number for each identifier. The compiler keeps a separate hashtable that allows us to recover the human-readable name of a Symbol when necessary (such as when printing a syntax error). When the compiler generates the __test_reexports module, it generates a new Symbol for the identifier, so while the compiler-generated __test_reexports may share a name with your hand-written one, it will not share a Symbol. This technique prevents name collision during code generation and is the foundation of Rust's macro hygiene.

Step 2: Harness Generation

Now that our tests are accessible from the root of our crate, we need to do something with them. libsyntax generates a module like so:

pub mod __test {
  extern crate test;
  const TESTS: &'static [self::test::TestDescAndFn] = &[/*...*/];

  #[main]
  pub fn main() {
    self::test::test_static_main(TESTS);
  }
}

While this transformation is simple, it gives us a lot of insight into how tests are actually run. The tests are aggregated into an array and passed to a test runner called test_static_main. We'll come back to exactly what TestDescAndFn is, but for now, the key takeaway is that there is a crate called test that is part of Rust core, that implements all of the runtime for testing. test's interface is unstable, so the only stable way to interact with it is through the #[test] macro.

Step 3: Test Object Generation

If you've written tests in Rust before, you may be familiar with some of the optional attributes available on test functions. For example, a test can be annotated with #[should_panic] if we expect the test to cause a panic. It looks something like this:

#[test]
#[should_panic]
fn foo() {
  panic!("intentional");
}

This means our tests are more than just simple functions, they have configuration information as well. test encodes this configuration data into a struct called TestDesc. For each test function in a crate, libsyntax will parse its attributes and generate a TestDesc instance. It then combines the TestDesc and test function into the predictably named TestDescAndFn struct, that test_static_main operates on. For a given test, the generated TestDescAndFn instance looks like so:

self::test::TestDescAndFn{
  desc: self::test::TestDesc{
    name: self::test::StaticTestName("foo"),
    ignore: false,
    should_panic: self::test::ShouldPanic::Yes,
    allow_fail: false,
  },
  testfn: self::test::StaticTestFn(||
    self::test::assert_test_result(::crate::__test_reexports::foo())),
}

Once we've constructed an array of these test objects, they're passed to the test runner via the harness generated in step 2.

Inspecting the generated code

On nightly rust, there's an unstable flag called unpretty that you can use to print out the module source after macro expansion:

$ rustc my_mod.rs -Z unpretty=hir

Macro expansion

Macro expansion happens during parsing. rustc has two parsers, in fact: the normal Rust parser, and the macro parser. During the parsing phase, the normal Rust parser will set aside the contents of macros and their invocations. Later, before name resolution, macros are expanded using these portions of the code. The macro parser, in turn, may call the normal Rust parser when it needs to bind a metavariable (e.g. $my_expr) while parsing the contents of a macro invocation. The code for macro expansion is in src/libsyntax/ext/tt/. This chapter aims to explain how macro expansion works.

Example

It's helpful to have an example to refer to. For the remainder of this chapter, whenever we refer to the "example definition", we mean the following:

macro_rules! printer {
    (print $mvar:ident) => {
        println!("{}", $mvar);
    }
    (print twice $mvar:ident) => {
        println!("{}", $mvar);
        println!("{}", $mvar);
    }
}

$mvar is called a metavariable. Unlike normal variables, rather than binding to a value in a computation, a metavariable binds at compile time to a tree of tokens. A token is a single "unit" of the grammar, such as an identifier (e.g. foo) or punctuation (e.g. =>). There are also other special tokens, such as EOF, which indicates that there are no more tokens. Token trees resulting from paired parentheses-like characters ((...), [...], and {...}) – they include the open and close and all the tokens in between (we do require that parentheses-like characters be balanced). Having macro expansion operate on token streams rather than the raw bytes of a source file abstracts away a lot of complexity. The macro expander (and much of the rest of the compiler) doesn't really care that much about the exact line and column of some syntactic construct in the code; it cares about what constructs are used in the code. Using tokens allows us to care about what without worrying about where. For more information about tokens, see the Parsing chapter of this book.

Whenever we refer to the "example invocation", we mean the following snippet:

printer!(print foo); // Assume `foo` is a variable defined somewhere else...

The process of expanding the macro invocation into the syntax tree println!("{}", foo) and then expanding that into a call to Display::fmt is called macro expansion, and it is the topic of this chapter.

The macro parser

There are two parts to macro expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser.

Basically, the macro parser is like an NFA-based regex parser. It uses an algorithm similar in spirit to the Earley parsing algorithm. The macro parser is defined in src/libsyntax/ext/tt/macro_parser.rs.

The interface of the macro parser is as follows (this is slightly simplified):

fn parse(
    sess: ParserSession,
    tts: TokenStream,
    ms: &[TokenTree]
) -> NamedParseResult

In this interface:

  • sess is a "parsing session", which keeps track of some metadata. Most notably, this is used to keep track of errors that are generated so they can be reported to the user.
  • tts is a stream of tokens. The macro parser's job is to consume the raw stream of tokens and output a binding of metavariables to corresponding token trees.
  • ms a matcher. This is a sequence of token trees that we want to match tts against.

In the analogy of a regex parser, tts is the input and we are matching it against the pattern ms. Using our examples, tts could be the stream of tokens containing the inside of the example invocation print foo, while ms might be the sequence of token (trees) print $mvar:ident.

The output of the parser is a NamedParseResult, which indicates which of three cases has occurred:

  • Success: tts matches the given matcher ms, and we have produced a binding from metavariables to the corresponding token trees.
  • Failure: tts does not match ms. This results in an error message such as "No rule expected token blah".
  • Error: some fatal error has occurred in the parser. For example, this happens if there are more than one pattern match, since that indicates the macro is ambiguous.

The full interface is defined here.

The macro parser does pretty much exactly the same as a normal regex parser with one exception: in order to parse different types of metavariables, such as ident, block, expr, etc., the macro parser must sometimes call back to the normal Rust parser.

As mentioned above, both definitions and invocations of macros are parsed using the macro parser. This is extremely non-intuitive and self-referential. The code to parse macro definitions is in src/libsyntax/ext/tt/macro_rules.rs. It defines the pattern for matching for a macro definition as $( $lhs:tt => $rhs:tt );+. In other words, a macro_rules definition should have in its body at least one occurrence of a token tree followed by => followed by another token tree. When the compiler comes to a macro_rules definition, it uses this pattern to match the two token trees per rule in the definition of the macro using the macro parser itself. In our example definition, the metavariable $lhs would match the patterns of both arms: (print $mvar:ident) and (print twice $mvar:ident). And $rhs would match the bodies of both arms: { println!("{}", $mvar); } and { println!("{}", $mvar); println!("{}", $mvar); }. The parser would keep this knowledge around for when it needs to expand a macro invocation.

When the compiler comes to a macro invocation, it parses that invocation using the same NFA-based macro parser that is described above. However, the matcher used is the first token tree ($lhs) extracted from the arms of the macro definition. Using our example, we would try to match the token stream print foo from the invocation against the matchers print $mvar:ident and print twice $mvar:ident that we previously extracted from the definition. The algorithm is exactly the same, but when the macro parser comes to a place in the current matcher where it needs to match a non-terminal (e.g. $mvar:ident), it calls back to the normal Rust parser to get the contents of that non-terminal. In this case, the Rust parser would look for an ident token, which it finds (foo) and returns to the macro parser. Then, the macro parser proceeds in parsing as normal. Also, note that exactly one of the matchers from the various arms should match the invocation; if there is more than one match, the parse is ambiguous, while if there are no matches at all, there is a syntax error.

For more information about the macro parser's implementation, see the comments in src/libsyntax/ext/tt/macro_parser.rs.

Hygiene

If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code:

#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};

// Then, somewhere else
struct Bar {
    ...
};

DEFINE_FOO

Most people avoid writing C like this – and for good reason: it doesn't compile. The struct Bar defined by the macro clashes names with the struct Bar defined in the code. Consider also the following example:

#define DO_FOO(x) {\
    int y = 0;\
    foo(x, y);\
    }

// Then elsewhere
int y = 22;
DO_FOO(y);

Do you see the problem? We wanted to generate a call foo(22, 0), but instead we got foo(0, 0) because the macro defined its own y!

These are both examples of macro hygiene issues. Hygiene relates to how to handle names defined within a macro. In particular, a hygienic macro system prevents errors due to names introduced within a macro. Rust macros are hygienic in that they do not allow one to write the sorts of bugs above.

At a high level, hygiene within the rust compiler is accomplished by keeping track of the context where a name is introduced and used. We can then disambiguate names based on that context. Future iterations of the macro system will allow greater control to the macro author to use that context. For example, a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro).

In rustc, this "context" is tracked via Spans.

TODO: what is call-site hygiene? what is def-site hygiene?

TODO

Procedural Macros

TODO

Custom Derive

TODO

TODO: maybe something about macros 2.0?

Name resolution

The name resolution is a two-phase process. In the first phase, which runs during macro expansion, we build a tree of modules and resolve imports. Macro expansion and name resolution communicate with each other via the Resolver trait, defined in libsyntax.

The input to the second phase is the syntax tree, produced by parsing input files and expanding macros. This phase produces links from all the names in the source to relevant places where the name was introduced. It also generates helpful error messages, like typo suggestions, traits to import or lints about unused items.

A successful run of the second phase (Resolver::resolve_crate) creates kind of an index the rest of the compilation may use to ask about the present names (through the hir::lowering::Resolver interface).

The name resolution lives in the librustc_resolve crate, with the meat in lib.rs and some helpers or symbol-type specific logic in the other modules.

Namespaces

Different kind of symbols live in different namespaces ‒ eg. types don't clash with variables. This usually doesn't happen, because variables start with lower-case letter while types with upper case one, but this is only a convention. This is legal Rust code that'll compile (with warnings):


# #![allow(unused_variables)]
#fn main() {
type x = u32;
let x: x = 1;
let y: x = 2; // See? x is still a type here.
#}

To cope with this, and with slightly different scoping rules for these namespaces, the resolver keeps them separated and builds separate structures for them.

In other words, when the code talks about namespaces, it doesn't mean the module hierarchy, it's types vs. values vs. macros.

Scopes and ribs

A name is visible only in certain area in the source code. This forms a hierarchical structure, but not necessarily a simple one ‒ if one scope is part of another, it doesn't mean the name visible in the outer one is also visible in the inner one, or that it refers to the same thing.

To cope with that, the compiler introduces the concept of Ribs. This is abstraction of a scope. Every time the set of visible names potentially changes, a new rib is pushed onto a stack. The places where this can happen includes for example:

  • The obvious places ‒ curly braces enclosing a block, function boundaries, modules.
  • Introducing a let binding ‒ this can shadow another binding with the same name.
  • Macro expansion border ‒ to cope with macro hygiene.

When searching for a name, the stack of ribs is traversed from the innermost outwards. This helps to find the closest meaning of the name (the one not shadowed by anything else). The transition to outer rib may also change the rules what names are usable ‒ if there are nested functions (not closures), the inner one can't access parameters and local bindings of the outer one, even though they should be visible by ordinary scoping rules. An example:


# #![allow(unused_variables)]
#fn main() {
fn do_something<T: Default>(val: T) { // <- New rib in both types and values (1)
    // `val` is accessible, as is the helper function
    // `T` is accessible
    let helper = || { // New rib on `helper` (2) and another on the block (3)
        // `val` is accessible here
    }; // End of (3)
    // `val` is accessible, `helper` variable shadows `helper` function
    fn helper() { // <- New rib in both types and values (4)
        // `val` is not accessible here, (4) is not transparent for locals)
        // `T` is not accessible here
    } // End of (4)
    let val = T::default(); // New rib (5)
    // `val` is the variable, not the parameter here
} // End of (5), (2) and (1)
#}

Because the rules for different namespaces are a bit different, each namespace has its own independent rib stack that is constructed in parallel to the others. In addition, there's also a rib stack for local labels (eg. names of loops or blocks), which isn't a full namespace in its own right.

Overall strategy

To perform the name resolution of the whole crate, the syntax tree is traversed top-down and every encountered name is resolved. This works for most kinds of names, because at the point of use of a name it is already introduced in the Rib hierarchy.

There are some exceptions to this. Items are bit tricky, because they can be used even before encountered ‒ therefore every block needs to be first scanned for items to fill in its Rib.

Other, even more problematic ones, are imports which need recursive fixed-point resolution and macros, that need to be resolved and expanded before the rest of the code can be processed.

Therefore, the resolution is performed in multiple stages.

TODO:

This is a result of the first pass of learning the code. It is definitely incomplete and not detailed enough. It also might be inaccurate in places. Still, it probably provides useful first guidepost to what happens in there.

  • What exactly does it link to and how is that published and consumed by following stages of compilation?
  • Who calls it and how it is actually used.
  • Is it a pass and then the result is only used, or can it be computed incrementally (eg. for RLS)?
  • The overall strategy description is a bit vague.
  • Where does the name Rib come from?
  • Does this thing have its own tests, or is it tested only as part of some e2e testing?

The HIR

The HIR – "High-Level Intermediate Representation" – is the primary IR used in most of rustc. It is a compiler-friendly representation of the abstract syntax tree (AST) that is generated after parsing, macro expansion, and name resolution (see Lowering for how the HIR is created). Many parts of HIR resemble Rust surface syntax quite closely, with the exception that some of Rust's expression forms have been desugared away. For example, for loops are converted into a loop and do not appear in the HIR. This makes HIR more amenable to analysis than a normal AST.

This chapter covers the main concepts of the HIR.

You can view the HIR representation of your code by passing the -Zunpretty=hir-tree flag to rustc:

> cargo rustc -- -Zunpretty=hir-tree

Out-of-band storage and the Crate type

The top-level data-structure in the HIR is the Crate, which stores the contents of the crate currently being compiled (we only ever construct HIR for the current crate). Whereas in the AST the crate data structure basically just contains the root module, the HIR Crate structure contains a number of maps and other things that serve to organize the content of the crate for easier access.

For example, the contents of individual items (e.g. modules, functions, traits, impls, etc) in the HIR are not immediately accessible in the parents. So, for example, if there is a module item foo containing a function bar():


# #![allow(unused_variables)]
#fn main() {
mod foo {
    fn bar() { }
}
#}

then in the HIR the representation of module foo (the Mod struct) would only have the ItemId I of bar(). To get the details of the function bar(), we would lookup I in the items map.

One nice result from this representation is that one can iterate over all items in the crate by iterating over the key-value pairs in these maps (without the need to trawl through the whole HIR). There are similar maps for things like trait items and impl items, as well as "bodies" (explained below).

The other reason to set up the representation this way is for better integration with incremental compilation. This way, if you gain access to an &hir::Item (e.g. for the mod foo), you do not immediately gain access to the contents of the function bar(). Instead, you only gain access to the id for bar(), and you must invoke some function to lookup the contents of bar() given its id; this gives the compiler a chance to observe that you accessed the data for bar(), and then record the dependency.

Identifiers in the HIR

Most of the code that has to deal with things in HIR tends not to carry around references into the HIR, but rather to carry around identifier numbers (or just "ids"). Right now, you will find four sorts of identifiers in active use:

  • DefId, which primarily names "definitions" or top-level items.
    • You can think of a DefId as being shorthand for a very explicit and complete path, like std::collections::HashMap. However, these paths are able to name things that are not nameable in normal Rust (e.g. impls), and they also include extra information about the crate (such as its version number, as two versions of the same crate can co-exist).
    • A DefId really consists of two parts, a CrateNum (which identifies the crate) and a DefIndex (which indexes into a list of items that is maintained per crate).
  • HirId, which combines the index of a particular item with an offset within that item.
    • the key point of a HirId is that it is relative to some item (which is named via a DefId).
  • BodyId, this is an absolute identifier that refers to a specific body (definition of a function or constant) in the crate. It is currently effectively a "newtype'd" NodeId.
  • NodeId, which is an absolute id that identifies a single node in the HIR tree.
    • While these are still in common use, they are being slowly phased out.
    • Since they are absolute within the crate, adding a new node anywhere in the tree causes the NodeIds of all subsequent code in the crate to change. This is terrible for incremental compilation, as you can perhaps imagine.

The HIR Map

Most of the time when you are working with the HIR, you will do so via the HIR Map, accessible in the tcx via tcx.hir_map (and defined in the hir::map module). The HIR map contains a number of methods to convert between IDs of various kinds and to lookup data associated with an HIR node.

For example, if you have a DefId, and you would like to convert it to a NodeId, you can use tcx.hir.as_local_node_id(def_id). This returns an Option<NodeId> – this will be None if the def-id refers to something outside of the current crate (since then it has no HIR node), but otherwise returns Some(n) where n is the node-id of the definition.

Similarly, you can use tcx.hir.find(n) to lookup the node for a NodeId. This returns a Option<Node<'tcx>>, where Node is an enum defined in the map; by matching on this you can find out what sort of node the node-id referred to and also get a pointer to the data itself. Often, you know what sort of node n is – e.g. if you know that n must be some HIR expression, you can do tcx.hir.expect_expr(n), which will extract and return the &hir::Expr, panicking if n is not in fact an expression.

Finally, you can use the HIR map to find the parents of nodes, via calls like tcx.hir.get_parent_node(n).

HIR Bodies

A hir::Body represents some kind of executable code, such as the body of a function/closure or the definition of a constant. Bodies are associated with an owner, which is typically some kind of item (e.g. an fn() or const), but could also be a closure expression (e.g. |x, y| x + y). You can use the HIR map to find the body associated with a given def-id (maybe_body_owned_by) or to find the owner of a body (body_owner_def_id).

Lowering

The lowering step converts AST to HIR. This means many structures are removed if they are irrelevant for type analysis or similar syntax agnostic analyses. Examples of such structures include but are not limited to

  • Parenthesis
    • Removed without replacement, the tree structure makes order explicit
  • for loops and while (let) loops
    • Converted to loop + match and some let bindings
  • if let
    • Converted to match
  • Universal impl Trait
    • Converted to generic arguments (but with some flags, to know that the user didn't write them)
  • Existential impl Trait
    • Converted to a virtual existential type declaration

Lowering needs to uphold several invariants in order to not trigger the sanity checks in src/librustc/hir/map/hir_id_validator.rs:

  1. A HirId must be used if created. So if you use the lower_node_id, you must use the resulting NodeId or HirId (either is fine, since any NodeIds in the HIR are checked for existing HirIds)
  2. Lowering a HirId must be done in the scope of the owning item. This means you need to use with_hir_id_owner if you are creating parts of an item other than the one being currently lowered. This happens for example during the lowering of existential impl Trait
  3. A NodeId that will be placed into a HIR structure must be lowered, even if its HirId is unused. Calling let _ = self.lower_node_id(node_id); is perfectly legitimate.
  4. If you are creating new nodes that didn't exist in the AST, you must create new ids for them. This is done by calling the next_id method, which produces both a new NodeId as well as automatically lowering it for you so you also get the HirId.

If you are creating new DefIds, since each DefId needs to have a corresponding NodeId, it is advisable to add these NodeIds to the AST so you don't have to generate new ones during lowering. This has the advantage of creating a way to find the DefId of something via its NodeId. If lowering needs this DefId in multiple places, you can't generate a new NodeId in all those places because you'd also get a new DefId then. With a NodeId from the AST this is not an issue.

Having the NodeId also allows the DefCollector to generate the DefIds instead of lowering having to do it on the fly. Centralizing the DefId generation in one place makes it easier to refactor and reason about.

The ty module: representing types

The ty module defines how the Rust compiler represents types internally. It also defines the typing context (tcx or TyCtxt), which is the central data structure in the compiler.

The tcx and how it uses lifetimes

The tcx ("typing context") is the central data structure in the compiler. It is the context that you use to perform all manner of queries. The struct TyCtxt defines a reference to this shared context:

tcx: TyCtxt<'a, 'gcx, 'tcx>
//          --  ----  ----
//          |   |     |
//          |   |     innermost arena lifetime (if any)
//          |   "global arena" lifetime
//          lifetime of this reference

As you can see, the TyCtxt type takes three lifetime parameters. These lifetimes are perhaps the most complex thing to understand about the tcx. During Rust compilation, we allocate most of our memory in arenas, which are basically pools of memory that get freed all at once. When you see a reference with a lifetime like 'tcx or 'gcx, you know that it refers to arena-allocated data (or data that lives as long as the arenas, anyhow).

We use two distinct levels of arenas. The outer level is the "global arena". This arena lasts for the entire compilation: so anything you allocate in there is only freed once compilation is basically over (actually, when we shift to executing LLVM).

To reduce peak memory usage, when we do type inference, we also use an inner level of arena. These arenas get thrown away once type inference is over. This is done because type inference generates a lot of "throw-away" types that are not particularly interesting after type inference completes, so keeping around those allocations would be wasteful.

Often, we wish to write code that explicitly asserts that it is not taking place during inference. In that case, there is no "local" arena, and all the types that you can access are allocated in the global arena. To express this, the idea is to use the same lifetime for the 'gcx and 'tcx parameters of TyCtxt. Just to be a touch confusing, we tend to use the name 'tcx in such contexts. Here is an example:

fn not_in_inference<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) {
    //                                        ----  ----
    //                                        Using the same lifetime here asserts
    //                                        that the innermost arena accessible through
    //                                        this reference *is* the global arena.
}

In contrast, if we want to code that can be usable during type inference, then you need to declare a distinct 'gcx and 'tcx lifetime parameter:

fn maybe_in_inference<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>, def_id: DefId) {
    //                                                ----  ----
    //                                        Using different lifetimes here means that
    //                                        the innermost arena *may* be distinct
    //                                        from the global arena (but doesn't have to be).
}

Allocating and working with types

Rust types are represented using the Ty<'tcx> defined in the ty module (not to be confused with the Ty struct from the HIR). This is in fact a simple type alias for a reference with 'tcx lifetime:

pub type Ty<'tcx> = &'tcx TyS<'tcx>;

You can basically ignore the TyS struct – you will basically never access it explicitly. We always pass it by reference using the Ty<'tcx> alias – the only exception I think is to define inherent methods on types. Instances of TyS are only ever allocated in one of the rustc arenas (never e.g. on the stack).

One common operation on types is to match and see what kinds of types they are. This is done by doing match ty.sty, sort of like this:

fn test_type<'tcx>(ty: Ty<'tcx>) {
    match ty.sty {
        ty::TyArray(elem_ty, len) => { ... }
        ...
    }
}

The sty field (the origin of this name is unclear to me; perhaps structural type?) is of type TyKind<'tcx>, which is an enum defining all of the different kinds of types in the compiler.

N.B. inspecting the sty field on types during type inference can be risky, as there may be inference variables and other things to consider, or sometimes types are not yet known that will become known later.).

To allocate a new type, you can use the various mk_ methods defined on the tcx. These have names that correpond mostly to the various kinds of type variants. For example:

let array_ty = tcx.mk_array(elem_ty, len * 2);

These methods all return a Ty<'tcx> – note that the lifetime you get back is the lifetime of the innermost arena that this tcx has access to. In fact, types are always canonicalized and interned (so we never allocate exactly the same type twice) and are always allocated in the outermost arena where they can be (so, if they do not contain any inference variables or other "temporary" types, they will be allocated in the global arena). However, the lifetime 'tcx is always a safe approximation, so that is what you get back.

NB. Because types are interned, it is possible to compare them for equality efficiently using == – however, this is almost never what you want to do unless you happen to be hashing and looking for duplicates. This is because often in Rust there are multiple ways to represent the same type, particularly once inference is involved. If you are going to be testing for type equality, you probably need to start looking into the inference code to do it right.

You can also find various common types in the tcx itself by accessing tcx.types.bool, tcx.types.char, etc (see CommonTypes for more).

Beyond types: other kinds of arena-allocated data structures

In addition to types, there are a number of other arena-allocated data structures that you can allocate, and which are found in this module. Here are a few examples:

  • Substs, allocated with mk_substs – this will intern a slice of types, often used to specify the values to be substituted for generics (e.g. HashMap<i32, u32> would be represented as a slice &'tcx [tcx.types.i32, tcx.types.u32]).
  • TraitRef, typically passed by value – a trait reference consists of a reference to a trait along with its various type parameters (including Self), like i32: Display (here, the def-id would reference the Display trait, and the substs would contain i32).
  • Predicate defines something the trait system has to prove (see traits module).

Import conventions

Although there is no hard and fast rule, the ty module tends to be used like so:

use ty::{self, Ty, TyCtxt};

In particular, since they are so common, the Ty and TyCtxt types are imported directly. Other types are often referenced with an explicit ty:: prefix (e.g. ty::TraitRef<'tcx>). But some modules choose to import a larger or smaller set of names explicitly.

Type inference

Type inference is the process of automatic detection of the type of an expression.

It is what allows Rust to work with fewer or no type annotations, making things easier for users:

fn main() {
    let mut things = vec![];
    things.push("thing")
}

Here, the type of things is inferenced to be &str because that's the value we push into things.

The type inference is based on the standard Hindley-Milner (HM) type inference algorithm, but extended in various way to accommodate subtyping, region inference, and higher-ranked types.

A note on terminology

We use the notation ?T to refer to inference variables, also called existential variables.

We use the terms "region" and "lifetime" interchangeably. Both refer to the 'a in &'a T.

The term "bound region" refers to a region that is bound in a function signature, such as the 'a in for<'a> fn(&'a u32). A region is "free" if it is not bound.

Creating an inference context

You create and "enter" an inference context by doing something like the following:

tcx.infer_ctxt().enter(|infcx| {
    // Use the inference context `infcx` here.
})

Each inference context creates a short-lived type arena to store the fresh types and things that it will create, as described in the chapter on the ty module. This arena is created by the enter function and disposed of after it returns.

Within the closure, infcx has the type InferCtxt<'cx, 'gcx, 'tcx> for some fresh 'cx and 'tcx – the latter corresponds to the lifetime of this temporary arena, and the 'cx is the lifetime of the InferCtxt itself. (Again, see the ty chapter for more details on this setup.)

The tcx.infer_ctxt method actually returns a builder, which means there are some kinds of configuration you can do before the infcx is created. See InferCtxtBuilder for more information.

Inference variables

The main purpose of the inference context is to house a bunch of inference variables – these represent types or regions whose precise value is not yet known, but will be uncovered as we perform type-checking.

If you're familiar with the basic ideas of unification from H-M type systems, or logic languages like Prolog, this is the same concept. If you're not, you might want to read a tutorial on how H-M type inference works, or perhaps this blog post on unification in the Chalk project.

All told, the inference context stores four kinds of inference variables as of this writing:

  • Type variables, which come in three varieties:
    • General type variables (the most common). These can be unified with any type.
    • Integral type variables, which can only be unified with an integral type, and arise from an integer literal expression like 22.
    • Float type variables, which can only be unified with a float type, and arise from a float literal expression like 22.0.
  • Region variables, which represent lifetimes, and arise all over the place.

All the type variables work in much the same way: you can create a new type variable, and what you get is Ty<'tcx> representing an unresolved type ?T. Then later you can apply the various operations that the inferencer supports, such as equality or subtyping, and it will possibly instantiate (or bind) that ?T to a specific value as a result.

The region variables work somewhat differently, and are described below in a separate section.

Enforcing equality / subtyping

The most basic operations you can perform in the type inferencer is equality, which forces two types T and U to be the same. The recommended way to add an equality constraint is to use the at method, roughly like so:

infcx.at(...).eq(t, u);

The first at() call provides a bit of context, i.e. why you are doing this unification, and in what environment, and the eq method performs the actual equality constraint.

When you equate things, you force them to be precisely equal. Equating returns an InferResult – if it returns Err(err), then equating failed, and the enclosing TypeError will tell you what went wrong.

The success case is perhaps more interesting. The "primary" return type of eq is () – that is, when it succeeds, it doesn't return a value of any particular interest. Rather, it is executed for its side-effects of constraining type variables and so forth. However, the actual return type is not (), but rather InferOk<()>. The InferOk type is used to carry extra trait obligations – your job is to ensure that these are fulfilled (typically by enrolling them in a fulfillment context). See the trait chapter for more background on that.

You can similarly enforce subtyping through infcx.at(..).sub(..). The same basic concepts as above apply.

"Trying" equality

Sometimes you would like to know if it is possible to equate two types without error. You can test that with infcx.can_eq (or infcx.can_sub for subtyping). If this returns Ok, then equality is possible – but in all cases, any side-effects are reversed.

Be aware, though, that the success or failure of these methods is always modulo regions. That is, two types &'a u32 and &'b u32 will return Ok for can_eq, even if 'a != 'b. This falls out from the "two-phase" nature of how we solve region constraints.

Snapshots

As described in the previous section on can_eq, often it is useful to be able to do a series of operations and then roll back their side-effects. This is done for various reasons: one of them is to be able to backtrack, trying out multiple possibilities before settling on which path to take. Another is in order to ensure that a series of smaller changes take place atomically or not at all.

To allow for this, the inference context supports a snapshot method. When you call it, it will start recording changes that occur from the operations you perform. When you are done, you can either invoke rollback_to, which will undo those changes, or else confirm, which will make the permanent. Snapshots can be nested as long as you follow a stack-like discipline.

Rather than use snapshots directly, it is often helpful to use the methods like commit_if_ok or probe that encapsulate higher-level patterns.

Subtyping obligations

One thing worth discussing is subtyping obligations. When you force two types to be a subtype, like ?T <: i32, we can often convert those into equality constraints. This follows from Rust's rather limited notion of subtyping: so, in the above case, ?T <: i32 is equivalent to ?T = i32.

However, in some cases we have to be more careful. For example, when regions are involved. So if you have ?T <: &'a i32, what we would do is to first "generalize" &'a i32 into a type with a region variable: &'?b i32, and then unify ?T with that (?T = &'?b i32). We then relate this new variable with the original bound:

&'?b i32 <: &'a i32

This will result in a region constraint (see below) of '?b: 'a.

One final interesting case is relating two unbound type variables, like ?T <: ?U. In that case, we can't make progress, so we enqueue an obligation Subtype(?T, ?U) and return it via the InferOk mechanism. You'll have to try again when more details about ?T or ?U are known.

Region constraints

Regions are inferenced somewhat differently from types. Rather than eagerly unifying things, we simply collect constraints as we go, but make (almost) no attempt to solve regions. These constraints have the form of an "outlives" constraint:

'a: 'b

Actually the code tends to view them as a subregion relation, but it's the same idea:

'b <= 'a

(There are various other kinds of constraints, such as "verifys"; see the region_constraints module for details.)

There is one case where we do some amount of eager unification. If you have an equality constraint between two regions

'a = 'b

we will record that fact in a unification table. You can then use opportunistic_resolve_var to convert 'b to 'a (or vice versa). This is sometimes needed to ensure termination of fixed-point algorithms.

Extracting region constraints

Ultimately, region constraints are only solved at the very end of type-checking, once all other constraints are known. There are two ways to solve region constraints right now: lexical and non-lexical. Eventually there will only be one.

To solve lexical region constraints, you invoke resolve_regions_and_report_errors. This "closes" the region constraint process and invoke the lexical_region_resolve code. Once this is done, any further attempt to equate or create a subtyping relationship will yield an ICE.

Non-lexical region constraints are not handled within the inference context. Instead, the NLL solver (actually, the MIR type-checker) invokes take_and_reset_region_constraints periodically. This extracts all of the outlives constraints from the region solver, but leaves the set of variables intact. This is used to get just the region constraints that resulted from some particular point in the program, since the NLL solver needs to know not just what regions were subregions but where. Finally, the NLL solver invokes take_region_var_origins, which "closes" the region constraint process in the same way as normal solving.

Lexical region resolution

Lexical region resolution is done by initially assigning each region variable to an empty value. We then process each outlives constraint repeatedly, growing region variables until a fixed-point is reached. Region variables can be grown using a least-upper-bound relation on the region lattice in a fairly straightforward fashion.

Trait resolution (old-style)

This chapter describes the general process of trait resolution and points out some non-obvious things.

Note: This chapter (and its subchapters) describe how the trait solver currently works. However, we are in the process of designing a new trait solver. If you'd prefer to read about that, see this traits chapter.

Major concepts

Trait resolution is the process of pairing up an impl with each reference to a trait. So, for example, if there is a generic function like:

fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> { ... }

and then a call to that function:

let v: Vec<isize> = clone_slice(&[1, 2, 3])

it is the job of trait resolution to figure out whether there exists an impl of (in this case) isize : Clone.

Note that in some cases, like generic functions, we may not be able to find a specific impl, but we can figure out that the caller must provide an impl. For example, consider the body of clone_slice:

fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> {
    let mut v = Vec::new();
    for e in &x {
        v.push((*e).clone()); // (*)
    }
}

The line marked (*) is only legal if T (the type of *e) implements the Clone trait. Naturally, since we don't know what T is, we can't find the specific impl; but based on the bound T:Clone, we can say that there exists an impl which the caller must provide.

We use the term obligation to refer to a trait reference in need of an impl. Basically, the trait resolution system resolves an obligation by proving that an appropriate impl does exist.

During type checking, we do not store the results of trait selection. We simply wish to verify that trait selection will succeed. Then later, at trans time, when we have all concrete types available, we can repeat the trait selection to choose an actual implementation, which will then be generated in the output binary.

Overview

Trait resolution consists of three major parts:

  • Selection: Deciding how to resolve a specific obligation. For example, selection might decide that a specific obligation can be resolved by employing an impl which matches the Self type, or by using a parameter bound (e.g. T: Trait). In the case of an impl, selecting one obligation can create nested obligations because of where clauses on the impl itself. It may also require evaluating those nested obligations to resolve ambiguities.

  • Fulfillment: The fulfillment code is what tracks that obligations are completely fulfilled. Basically it is a worklist of obligations to be selected: once selection is successful, the obligation is removed from the worklist and any nested obligations are enqueued.

  • Coherence: The coherence checks are intended to ensure that there are never overlapping impls, where two impls could be used with equal precedence.

Selection

Selection is the process of deciding whether an obligation can be resolved and, if so, how it is to be resolved (via impl, where clause, etc). The main interface is the select() function, which takes an obligation and returns a SelectionResult. There are three possible outcomes:

  • Ok(Some(selection)) – yes, the obligation can be resolved, and selection indicates how. If the impl was resolved via an impl, then selection may also indicate nested obligations that are required by the impl.

  • Ok(None) – we are not yet sure whether the obligation can be resolved or not. This happens most commonly when the obligation contains unbound type variables.

  • Err(err) – the obligation definitely cannot be resolved due to a type error or because there are no impls that could possibly apply.

The basic algorithm for selection is broken into two big phases: candidate assembly and confirmation.

Note that because of how lifetime inference works, it is not possible to give back immediate feedback as to whether a unification or subtype relationship between lifetimes holds or not. Therefore, lifetime matching is not considered during selection. This is reflected in the fact that subregion assignment is infallible. This may yield lifetime constraints that will later be found to be in error (in contrast, the non-lifetime-constraints have already been checked during selection and can never cause an error, though naturally they may lead to other errors downstream).

Candidate assembly

Searches for impls/where-clauses/etc that might possibly be used to satisfy the obligation. Each of those is called a candidate. To avoid ambiguity, we want to find exactly one candidate that is definitively applicable. In some cases, we may not know whether an impl/where-clause applies or not – this occurs when the obligation contains unbound inference variables.

The subroutines that decide whether a particular impl/where-clause/etc applies to a particular obligation are collectively referred to as the process of matching. At the moment, this amounts to unifying the Self types, but in the future we may also recursively consider some of the nested obligations, in the case of an impl.

TODO: what does "unifying the Self types" mean? The Self of the obligation with that of an impl?

The basic idea for candidate assembly is to do a first pass in which we identify all possible candidates. During this pass, all that we do is try and unify the type parameters. (In particular, we ignore any nested where clauses.) Presuming that this unification succeeds, the impl is added as a candidate.

Once this first pass is done, we can examine the set of candidates. If it is a singleton set, then we are done: this is the only impl in scope that could possibly apply. Otherwise, we can winnow down the set of candidates by using where clauses and other conditions. If this reduced set yields a single, unambiguous entry, we're good to go, otherwise the result is considered ambiguous.

The basic process: Inferring based on the impls we see

This process is easier if we work through some examples. Consider the following trait:

trait Convert<Target> {
    fn convert(&self) -> Target;
}

This trait just has one method. It's about as simple as it gets. It converts from the (implicit) Self type to the Target type. If we wanted to permit conversion between isize and usize, we might implement Convert like so:

impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize

Now imagine there is some code like the following:

let x: isize = ...;
let y = x.convert();

The call to convert will generate a trait reference Convert<$Y> for isize, where $Y is the type variable representing the type of y. Of the two impls we can see, the only one that matches is Convert<usize> for isize. Therefore, we can select this impl, which will cause the type of $Y to be unified to usize. (Note that while assembling candidates, we do the initial unifications in a transaction, so that they don't affect one another.)

TODO: The example says we can "select" the impl, but this section is talking specifically about candidate assembly. Does this mean we can sometimes skip confirmation? Or is this poor wording? TODO: Is the unification of $Y part of trait resolution or type inference? Or is this not the same type of "inference variable" as in type inference?

Winnowing: Resolving ambiguities

But what happens if there are multiple impls where all the types unify? Consider this example:

trait Get {
    fn get(&self) -> Self;
}

impl<T:Copy> Get for T {
    fn get(&self) -> T { *self }
}

impl<T:Get> Get for Box<T> {
    fn get(&self) -> Box<T> { Box::new(get_it(&**self)) }
}

What happens when we invoke get_it(&Box::new(1_u16)), for example? In this case, the Self type is Box<u16> – that unifies with both impls, because the first applies to all types T, and the second to all Box<T>. In order for this to be unambiguous, the compiler does a winnowing pass that considers where clauses and attempts to remove candidates. In this case, the first impl only applies if Box<u16> : Copy, which doesn't hold. After winnowing, then, we are left with just one candidate, so we can proceed.

where clauses

Besides an impl, the other major way to resolve an obligation is via a where clause. The selection process is always given a parameter environment which contains a list of where clauses, which are basically obligations that we can assume are satisfiable. We will iterate over that list and check whether our current obligation can be found in that list. If so, it is considered satisfied. More precisely, we want to check whether there is a where-clause obligation that is for the same trait (or some subtrait) and which can match against the obligation.

Consider this simple example:

trait A1 {
    fn do_a1(&self);
}
trait A2 : A1 { ... }

trait B {
    fn do_b(&self);
}

fn foo<X:A2+B>(x: X) {
    x.do_a1(); // (*)
    x.do_b();  // (#)
}

In the body of foo, clearly we can use methods of A1, A2, or B on variable x. The line marked (*) will incur an obligation X: A1, while the line marked (#) will incur an obligation X: B. Meanwhile, the parameter environment will contain two where-clauses: X : A2 and X : B. For each obligation, then, we search this list of where-clauses. The obligation X: B trivially matches against the where-clause X: B. To resolve an obligation X:A1, we would note that X:A2 implies that X:A1.

Confirmation

Confirmation unifies the output type parameters of the trait with the values found in the obligation, possibly yielding a type error.

Suppose we have the following variation of the Convert example in the previous section:

trait Convert<Target> {
    fn convert(&self) -> Target;
}

impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize

let x: isize = ...;
let y: char = x.convert(); // NOTE: `y: char` now!

Confirmation is where an error would be reported because the impl specified that Target would be usize, but the obligation reported char. Hence the result of selection would be an error.

Note that the candidate impl is chosen based on the Self type, but confirmation is done based on (in this case) the Target type parameter.

Selection during translation

As mentioned above, during type checking, we do not store the results of trait selection. At trans time, we repeat the trait selection to choose a particular impl for each method call. In this second selection, we do not consider any where-clauses to be in scope because we know that each resolution will resolve to a particular impl.

One interesting twist has to do with nested obligations. In general, in trans, we only need to do a "shallow" selection for an obligation. That is, we wish to identify which impl applies, but we do not (yet) need to decide how to select any nested obligations. Nonetheless, we do currently do a complete resolution, and that is because it can sometimes inform the results of type inference. That is, we do not have the full substitutions in terms of the type variables of the impl available to us, so we must run trait selection to figure everything out.

TODO: is this still talking about trans?

Here is an example:

trait Foo { ... }
impl<U, T:Bar<U>> Foo for Vec<T> { ... }

impl Bar<usize> for isize { ... }

After one shallow round of selection for an obligation like Vec<isize> : Foo, we would know which impl we want, and we would know that T=isize, but we do not know the type of U. We must select the nested obligation isize : Bar<U> to find out that U=usize.

It would be good to only do just as much nested resolution as necessary. Currently, though, we just do a full resolution.

Higher-ranked trait bounds

One of the more subtle concepts in trait resolution is higher-ranked trait bounds. An example of such a bound is for<'a> MyTrait<&'a isize>. Let's walk through how selection on higher-ranked trait references works.

Basic matching and placeholder leaks

Suppose we have a trait Foo:


# #![allow(unused_variables)]
#fn main() {
trait Foo<X> {
    fn foo(&self, x: X) { }
}
#}

Let's say we have a function want_hrtb that wants a type which implements Foo<&'a isize> for any 'a:

fn want_hrtb<T>() where T : for<'a> Foo<&'a isize> { ... }

Now we have a struct AnyInt that implements Foo<&'a isize> for any 'a:

struct AnyInt;
impl<'a> Foo<&'a isize> for AnyInt { }

And the question is, does AnyInt : for<'a> Foo<&'a isize>? We want the answer to be yes. The algorithm for figuring it out is closely related to the subtyping for higher-ranked types (which is described here and also in a paper by SPJ. If you wish to understand higher-ranked subtyping, we recommend you read the paper). There are a few parts:

  1. Replace bound regions in the obligation with placeholders.
  2. Match the impl against the placeholder obligation.
  3. Check for placeholder leaks.

So let's work through our example.

  1. The first thing we would do is to replace the bound region in the obligation with a placeholder, yielding AnyInt : Foo<&'0 isize> (here '0 represents placeholder region #0). Note that we now have no quantifiers; in terms of the compiler type, this changes from a ty::PolyTraitRef to a TraitRef. We would then create the TraitRef from the impl, using fresh variables for it's bound regions (and thus getting Foo<&'$a isize>, where '$a is the inference variable for 'a).

  2. Next we relate the two trait refs, yielding a graph with the constraint that '0 == '$a.

  3. Finally, we check for placeholder "leaks" – a leak is basically any attempt to relate a placeholder region to another placeholder region, or to any region that pre-existed the impl match. The leak check is done by searching from the placeholder region to find the set of regions that it is related to in any way. This is called the "taint" set. To pass the check, that set must consist solely of itself and region variables from the impl. If the taint set includes any other region, then the match is a failure. In this case, the taint set for '0 is {'0, '$a}, and hence the check will succeed.

Let's consider a failure case. Imagine we also have a struct

struct StaticInt;
impl Foo<&'static isize> for StaticInt;

We want the obligation StaticInt : for<'a> Foo<&'a isize> to be considered unsatisfied. The check begins just as before. 'a is replaced with a placeholder '0 and the impl trait reference is instantiated to Foo<&'static isize>. When we relate those two, we get a constraint like 'static == '0. This means that the taint set for '0 is {'0, 'static}, which fails the leak check.

TODO: This is because 'static is not a region variable but is in the taint set, right?

Higher-ranked trait obligations

Once the basic matching is done, we get to another interesting topic: how to deal with impl obligations. I'll work through a simple example here. Imagine we have the traits Foo and Bar and an associated impl:


# #![allow(unused_variables)]
#fn main() {
trait Foo<X> {
    fn foo(&self, x: X) { }
}

trait Bar<X> {
    fn bar(&self, x: X) { }
}

impl<X,F> Foo<X> for F
    where F : Bar<X>
{
}
#}

Now let's say we have a obligation Baz: for<'a> Foo<&'a isize> and we match this impl. What obligation is generated as a result? We want to get Baz: for<'a> Bar<&'a isize>, but how does that happen?

After the matching, we are in a position where we have a placeholder substitution like X => &'0 isize. If we apply this substitution to the impl obligations, we get F : Bar<&'0 isize>. Obviously this is not directly usable because the placeholder region '0 cannot leak out of our computation.

What we do is to create an inverse mapping from the taint set of '0 back to the original bound region ('a, here) that '0 resulted from. (This is done in higher_ranked::plug_leaks). We know that the leak check passed, so this taint set consists solely of the placeholder region itself plus various intermediate region variables. We then walk the trait-reference and convert every region in that taint set back to a late-bound region, so in this case we'd wind up with Baz: for<'a> Bar<&'a isize>.

Caching and subtle considerations therewith

In general, we attempt to cache the results of trait selection. This is a somewhat complex process. Part of the reason for this is that we want to be able to cache results even when all the types in the trait reference are not fully known. In that case, it may happen that the trait selection process is also influencing type variables, so we have to be able to not only cache the result of the selection process, but replay its effects on the type variables.

An example

The high-level idea of how the cache works is that we first replace all unbound inference variables with placeholder versions. Therefore, if we had a trait reference usize : Foo<$t>, where $t is an unbound inference variable, we might replace it with usize : Foo<$0>, where $0 is a placeholder type. We would then look this up in the cache.

If we found a hit, the hit would tell us the immediate next step to take in the selection process (e.g. apply impl #22, or apply where clause X : Foo<Y>).

On the other hand, if there is no hit, we need to go through the selection process from scratch. Suppose, we come to the conclusion that the only possible impl is this one, with def-id 22:

impl Foo<isize> for usize { ... } // Impl #22

We would then record in the cache usize : Foo<$0> => ImplCandidate(22). Next we would confirm ImplCandidate(22), which would (as a side-effect) unify $t with isize.

Now, at some later time, we might come along and see a usize : Foo<$u>. When replaced with a placeholder, this would yield usize : Foo<$0>, just as before, and hence the cache lookup would succeed, yielding ImplCandidate(22). We would confirm ImplCandidate(22) which would (as a side-effect) unify $u with isize.

Where clauses and the local vs global cache

One subtle interaction is that the results of trait lookup will vary depending on what where clauses are in scope. Therefore, we actually have two caches, a local and a global cache. The local cache is attached to the ParamEnv, and the global cache attached to the tcx. We use the local cache whenever the result might depend on the where clauses that are in scope. The determination of which cache to use is done by the method pick_candidate_cache in select.rs. At the moment, we use a very simple, conservative rule: if there are any where-clauses in scope, then we use the local cache. We used to try and draw finer-grained distinctions, but that led to a serious of annoying and weird bugs like #22019 and #18290. This simple rule seems to be pretty clearly safe and also still retains a very high hit rate (~95% when compiling rustc).

TODO: it looks like pick_candidate_cache no longer exists. In general, is this section still accurate at all?

Specialization

TODO: where does Chalk fit in? Should we mention/discuss it here?

Defined in the specialize module.

The basic strategy is to build up a specialization graph during coherence checking (recall that coherence checking looks for overlapping impls). Insertion into the graph locates the right place to put an impl in the specialization hierarchy; if there is no right place (due to partial overlap but no containment), you get an overlap error. Specialization is consulted when selecting an impl (of course), and the graph is consulted when propagating defaults down the specialization hierarchy.

You might expect that the specialization graph would be used during selection – i.e. when actually performing specialization. This is not done for two reasons:

  • It's merely an optimization: given a set of candidates that apply, we can determine the most specialized one by comparing them directly for specialization, rather than consulting the graph. Given that we also cache the results of selection, the benefit of this optimization is questionable.

  • To build the specialization graph in the first place, we need to use selection (because we need to determine whether one impl specializes another). Dealing with this reentrancy would require some additional mode switch for selection. Given that there seems to be no strong reason to use the graph anyway, we stick with a simpler approach in selection, and use the graph only for propagating default implementations.

Trait impl selection can succeed even when multiple impls can apply, as long as they are part of the same specialization family. In that case, it returns a single impl on success – this is the most specialized impl known to apply. However, if there are any inference variables in play, the returned impl may not be the actual impl we will use at trans time. Thus, we take special care to avoid projecting associated types unless either (1) the associated type does not use default and thus cannot be overridden or (2) all input types are known concretely.

Trait solving (new-style)

🚧 This chapter describes "new-style" trait solving. This is still in the process of being implemented; this chapter serves as a kind of in-progress design document. If you would prefer to read about how the current trait solver works, check out this other chapter. 🚧

By the way, if you would like to help in hacking on the new solver, you will find instructions for getting involved in the Traits Working Group tracking issue!

The new-style trait solver is based on the work done in chalk. Chalk recasts Rust's trait system explicitly in terms of logic programming. It does this by "lowering" Rust code into a kind of logic program we can then execute queries against.

You can read more about chalk itself in the Overview of Chalk section.

Trait solving in rustc is based around a few key ideas:

  • Lowering to logic, which expresses Rust traits in terms of standard logical terms.
    • The goals and clauses chapter describes the precise form of rules we use, and lowering rules gives the complete set of lowering rules in a more reference-like form.
    • Lazy normalization, which is the technique we use to accommodate associated types when figuring out whether types are equal.
    • Region constraints, which are accumulated during trait solving but mostly ignored. This means that trait solving effectively ignores the precise regions involved, always – but we still remember the constraints on them so that those constraints can be checked by the type checker.
  • Canonical queries, which allow us to solve trait problems (like "is Foo implemented for the type Bar?") once, and then apply that same result independently in many different inference contexts.

This is not a complete list of topics. See the sidebar for more.

Ongoing work

The design of the new-style trait solving currently happens in two places:

chalk. The chalk repository is where we experiment with new ideas and designs for the trait system. It primarily consists of two parts:

  • a unit testing framework for the correctness and feasibility of the logical rules defining the new-style trait system.
  • the chalk_engine crate, which defines the new-style trait solver used both in the unit testing framework and in rustc.

rustc. Once we are happy with the logical rules, we proceed to implementing them in rustc. This mainly happens in librustc_traits.

Lowering to logic

The key observation here is that the Rust trait system is basically a kind of logic, and it can be mapped onto standard logical inference rules. We can then look for solutions to those inference rules in a very similar fashion to how e.g. a Prolog solver works. It turns out that we can't quite use Prolog rules (also called Horn clauses) but rather need a somewhat more expressive variant.

Rust traits and logic

One of the first observations is that the Rust trait system is basically a kind of logic. As such, we can map our struct, trait, and impl declarations into logical inference rules. For the most part, these are basically Horn clauses, though we'll see that to capture the full richness of Rust – and in particular to support generic programming – we have to go a bit further than standard Horn clauses.

To see how this mapping works, let's start with an example. Imagine we declare a trait and a few impls, like so:


# #![allow(unused_variables)]
#fn main() {
trait Clone { }
impl Clone for usize { }
impl<T> Clone for Vec<T> where T: Clone { }
#}

We could map these declarations to some Horn clauses, written in a Prolog-like notation, as follows:

Clone(usize).
Clone(Vec<?T>) :- Clone(?T).

// The notation `A :- B` means "A is true if B is true".
// Or, put another way, B implies A.

In Prolog terms, we might say that Clone(Foo) – where Foo is some Rust type – is a predicate that represents the idea that the type Foo implements Clone. These rules are program clauses; they state the conditions under which that predicate can be proven (i.e., considered true). So the first rule just says "Clone is implemented for usize". The next rule says "for any type ?T, Clone is implemented for Vec<?T> if clone is implemented for ?T". So e.g. if we wanted to prove that Clone(Vec<Vec<usize>>), we would do so by applying the rules recursively:

  • Clone(Vec<Vec<usize>>) is provable if:
    • Clone(Vec<usize>) is provable if:
      • Clone(usize) is provable. (Which it is, so we're all good.)

But now suppose we tried to prove that Clone(Vec<Bar>). This would fail (after all, I didn't give an impl of Clone for Bar):

  • Clone(Vec<Bar>) is provable if:
    • Clone(Bar) is provable. (But it is not, as there are no applicable rules.)

We can easily extend the example above to cover generic traits with more than one input type. So imagine the Eq<T> trait, which declares that Self is equatable with a value of type T:

trait Eq<T> { ... }
impl Eq<usize> for usize { }
impl<T: Eq<U>> Eq<Vec<U>> for Vec<T> { }

That could be mapped as follows:

Eq(usize, usize).
Eq(Vec<?T>, Vec<?U>) :- Eq(?T, ?U).

So far so good.

Type-checking normal functions

OK, now that we have defined some logical rules that are able to express when traits are implemented and to handle associated types, let's turn our focus a bit towards type-checking. Type-checking is interesting because it is what gives us the goals that we need to prove. That is, everything we've seen so far has been about how we derive the rules by which we can prove goals from the traits and impls in the program; but we are also interested in how to derive the goals that we need to prove, and those come from type-checking.

Consider type-checking the function foo() here:

fn foo() { bar::<usize>() }
fn bar<U: Eq<U>>() { }

This function is very simple, of course: all it does is to call bar::<usize>(). Now, looking at the definition of bar(), we can see that it has one where-clause U: Eq<U>. So, that means that foo() will have to prove that usize: Eq<usize> in order to show that it can call bar() with usize as the type argument.

If we wanted, we could write a Prolog predicate that defines the conditions under which bar() can be called. We'll say that those conditions are called being "well-formed":

barWellFormed(?U) :- Eq(?U, ?U).

Then we can say that foo() type-checks if the reference to bar::<usize> (that is, bar() applied to the type usize) is well-formed:

fooTypeChecks :- barWellFormed(usize).

If we try to prove the goal fooTypeChecks, it will succeed:

  • fooTypeChecks is provable if:
    • barWellFormed(usize), which is provable if:
      • Eq(usize, usize), which is provable because of an impl.

Ok, so far so good. Let's move on to type-checking a more complex function.

Type-checking generic functions: beyond Horn clauses

In the last section, we used standard Prolog horn-clauses (augmented with Rust's notion of type equality) to type-check some simple Rust functions. But that only works when we are type-checking non-generic functions. If we want to type-check a generic function, it turns out we need a stronger notion of goal than what Prolog can provide. To see what I'm talking about, let's revamp our previous example to make foo generic:

fn foo<T: Eq<T>>() { bar::<T>() }
fn bar<U: Eq<U>>() { }

To type-check the body of foo, we need to be able to hold the type T "abstract". That is, we need to check that the body of foo is type-safe for all types T, not just for some specific type. We might express this like so:

fooTypeChecks :-
  // for all types T...
  forall<T> {
    // ...if we assume that Eq(T, T) is provable...
    if (Eq(T, T)) {
      // ...then we can prove that `barWellFormed(T)` holds.
      barWellFormed(T)
    }
  }.

This notation I'm using here is the notation I've been using in my prototype implementation; it's similar to standard mathematical notation but a bit Rustified. Anyway, the problem is that standard Horn clauses don't allow universal quantification (forall) or implication (if) in goals (though many Prolog engines do support them, as an extension). For this reason, we need to accept something called "first-order hereditary harrop" (FOHH) clauses – this long name basically means "standard Horn clauses with forall and if in the body". But it's nice to know the proper name, because there is a lot of work describing how to efficiently handle FOHH clauses; see for example Gopalan Nadathur's excellent "A Proof Procedure for the Logic of Hereditary Harrop Formulas" in the bibliography.

It turns out that supporting FOHH is not really all that hard. And once we are able to do that, we can easily describe the type-checking rule for generic functions like foo in our logic.

Source

This page is a lightly adapted version of a blog post by Nicholas Matsakis.

Goals and clauses

In logic programming terms, a goal is something that you must prove and a clause is something that you know is true. As described in the lowering to logic chapter, Rust's trait solver is based on an extension of hereditary harrop (HH) clauses, which extend traditional Prolog Horn clauses with a few new superpowers.

Goals and clauses meta structure

In Rust's solver, goals and clauses have the following forms (note that the two definitions reference one another):

Goal = DomainGoal           // defined in the section below
        | Goal && Goal
        | Goal || Goal
        | exists<K> { Goal }   // existential quantification
        | forall<K> { Goal }   // universal quantification
        | if (Clause) { Goal } // implication
        | true                 // something that's trivially true
        | ambiguous            // something that's never provable

Clause = DomainGoal
        | Clause :- Goal     // if can prove Goal, then Clause is true
        | Clause && Clause
        | forall<K> { Clause }

K = <type>     // a "kind"
    | <lifetime>

The proof procedure for these sorts of goals is actually quite straightforward. Essentially, it's a form of depth-first search. The paper "A Proof Procedure for the Logic of Hereditary Harrop Formulas" gives the details.

In terms of code, these types are defined in librustc/traits/mod.rs in rustc, and in chalk-ir/src/lib.rs in chalk.

Domain goals

Domain goals are the atoms of the trait logic. As can be seen in the definitions given above, general goals basically consist in a combination of domain goals.

Moreover, flattenning a bit the definition of clauses given previously, one can see that clauses are always of the form:

forall<K1, ..., Kn> { DomainGoal :- Goal }

hence domain goals are in fact clauses' LHS. That is, at the most granular level, domain goals are what the trait solver will end up trying to prove.

To define the set of domain goals in our system, we need to first introduce a few simple formulations. A trait reference consists of the name of a trait along with a suitable set of inputs P0..Pn:

TraitRef = P0: TraitName<P1..Pn>

So, for example, u32: Display is a trait reference, as is Vec<T>: IntoIterator. Note that Rust surface syntax also permits some extra things, like associated type bindings (Vec<T>: IntoIterator<Item = T>), that are not part of a trait reference.

A projection consists of an associated item reference along with its inputs P0..Pm:

Projection = <P0 as TraitName<P1..Pn>>::AssocItem<Pn+1..Pm>

Given these, we can define a DomainGoal as follows:

DomainGoal = Holds(WhereClause)
            | FromEnv(TraitRef)
            | FromEnv(Type)
            | WellFormed(TraitRef)
            | WellFormed(Type)
            | Normalize(Projection -> Type)

WhereClause = Implemented(TraitRef)
            | ProjectionEq(Projection = Type)
            | Outlives(Type: Region)
            | Outlives(Region: Region)

WhereClause refers to a where clause that a Rust user would actually be able to write in a Rust program. This abstraction exists only as a convenience as we sometimes want to only deal with domain goals that are effectively writable in Rust.

Let's break down each one of these, one-by-one.

Implemented(TraitRef)

e.g. Implemented(i32: Copy)

True if the given trait is implemented for the given input types and lifetimes.

ProjectionEq(Projection = Type)

e.g. ProjectionEq<T as Iterator>::Item = u8

The given associated type Projection is equal to Type; this can be proved with either normalization or using placeholder associated types. See the section on associated types.

Normalize(Projection -> Type)

e.g. ProjectionEq<T as Iterator>::Item -> u8

The given associated type Projection can be normalized to Type.

As discussed in the section on associated types, Normalize implies ProjectionEq, but not vice versa. In general, proving Normalize(<T as Trait>::Item -> U) also requires proving Implemented(T: Trait).

FromEnv(TraitRef)

e.g. FromEnv(Self: Add<i32>)

True if the inner TraitRef is assumed to be true, that is, if it can be derived from the in-scope where clauses.

For example, given the following function:


# #![allow(unused_variables)]
#fn main() {
fn loud_clone<T: Clone>(stuff: &T) -> T {
    println!("cloning!");
    stuff.clone()
}
#}

Inside the body of our function, we would have FromEnv(T: Clone). In-scope where clauses nest, so a function body inside an impl body inherits the impl body's where clauses, too.

This and the next rule are used to implement implied bounds. As we'll see in the section on lowering, FromEnv(TraitRef) implies Implemented(TraitRef), but not vice versa. This distinction is crucial to implied bounds.

FromEnv(Type)

e.g. FromEnv(HashSet<K>)

True if the inner Type is assumed to be well-formed, that is, if it is an input type of a function or an impl.

For example, given the following code:

struct HashSet<K> where K: Hash { ... }

fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
    println!("inserting!");
    set.insert(item);
}

HashSet<K> is an input type of the loud_insert function. Hence, we assume it to be well-formed, so we would have FromEnv(HashSet<K>) inside the body of our function. As we'll see in the section on lowering, FromEnv(HashSet<K>) implies Implemented(K: Hash) because the HashSet declaration was written with a K: Hash where clause. Hence, we don't need to repeat that bound on the loud_insert function: we rather automatically assume that it is true.

WellFormed(Item)

These goals imply that the given item is well-formed.

We can talk about different types of items being well-formed:

  • Types, like WellFormed(Vec<i32>), which is true in Rust, or WellFormed(Vec<str>), which is not (because str is not Sized.)

  • TraitRefs, like WellFormed(Vec<i32>: Clone).

Well-formedness is important to implied bounds. In particular, the reason it is okay to assume FromEnv(T: Clone) in the loud_clone example is that we also verify WellFormed(T: Clone) for each call site of loud_clone. Similarly, it is okay to assume FromEnv(HashSet<K>) in the loud_insert example because we will verify WellFormed(HashSet<K>) for each call site of loud_insert.

Outlives(Type: Region), Outlives(Region: Region)

e.g. Outlives(&'a str: 'b), Outlives('a: 'static)

True if the given type or region on the left outlives the right-hand region.

Coinductive goals

Most goals in our system are "inductive". In an inductive goal, circular reasoning is disallowed. Consider this example clause:

    Implemented(Foo: Bar) :-
        Implemented(Foo: Bar).

Considered inductively, this clause is useless: if we are trying to prove Implemented(Foo: Bar), we would then recursively have to prove Implemented(Foo: Bar), and that cycle would continue ad infinitum (the trait solver will terminate here, it would just consider that Implemented(Foo: Bar) is not known to be true).

However, some goals are co-inductive. Simply put, this means that cycles are OK. So, if Bar were a co-inductive trait, then the rule above would be perfectly valid, and it would indicate that Implemented(Foo: Bar) is true.

Auto traits are one example in Rust where co-inductive goals are used. Consider the Send trait, and imagine that we have this struct:


# #![allow(unused_variables)]
#fn main() {
struct Foo {
    next: Option<Box<Foo>>
}
#}

The default rules for auto traits say that Foo is Send if the types of its fields are Send. Therefore, we would have a rule like

Implemented(Foo: Send) :-
    Implemented(Option<Box<Foo>>: Send).

As you can probably imagine, proving that Option<Box<Foo>>: Send is going to wind up circularly requiring us to prove that Foo: Send again. So this would be an example where we wind up in a cycle – but that's ok, we do consider Foo: Send to hold, even though it references itself.

In general, co-inductive traits are used in Rust trait solving when we want to enumerate a fixed set of possibilities. In the case of auto traits, we are enumerating the set of reachable types from a given starting point (i.e., Foo can reach values of type Option<Box<Foo>>, which implies it can reach values of type Box<Foo>, and then of type Foo, and then the cycle is complete).

In addition to auto traits, WellFormed predicates are co-inductive. These are used to achieve a similar "enumerate all the cases" pattern, as described in the section on implied bounds.

Incomplete chapter

Some topics yet to be written:

  • Elaborate on the proof procedure
  • SLG solving – introduce negative reasoning

Equality and associated types

This section covers how the trait system handles equality between associated types. The full system consists of several moving parts, which we will introduce one by one:

  • Projection and the Normalize predicate
  • Placeholder associated type projections
  • The ProjectionEq predicate
  • Integration with unification

Associated type projection and normalization

When a trait defines an associated type (e.g., the Item type in the IntoIterator trait), that type can be referenced by the user using an associated type projection like <Option<u32> as IntoIterator>::Item.

Often, people will use the shorthand syntax T::Item. Presently, that syntax is expanded during "type collection" into the explicit form, though that is something we may want to change in the future.

In some cases, associated type projections can be normalized – that is, simplified – based on the types given in an impl. So, to continue with our example, the impl of IntoIterator for Option<T> declares (among other things) that Item = T:

impl<T> IntoIterator for Option<T> {
  type Item = T;
  ...
}

This means we can normalize the projection <Option<u32> as IntoIterator>::Item to just u32.

In this case, the projection was a "monomorphic" one – that is, it did not have any type parameters. Monomorphic projections are special because they can always be fully normalized.

Often, we can normalize other associated type projections as well. For example, <Option<?T> as IntoIterator>::Item, where ?T is an inference variable, can be normalized to just ?T.

In our logic, normalization is defined by a predicate Normalize. The Normalize clauses arise only from impls. For example, the impl of IntoIterator for Option<T> that we saw above would be lowered to a program clause like so:

forall<T> {
    Normalize(<Option<T> as IntoIterator>::Item -> T) :-
        Implemented(Option<T>: IntoIterator)
}

where in this case, the one Implemented condition is always true.

Since we do not permit quantification over traits, this is really more like a family of program clauses, one for each associated type.

We could apply that rule to normalize either of the examples that we've seen so far.

Placeholder associated types

Sometimes however we want to work with associated types that cannot be normalized. For example, consider this function:

fn foo<T: IntoIterator>(...) { ... }

In this context, how would we normalize the type T::Item?

Without knowing what T is, we can't really do so. To represent this case, we introduce a type called a placeholder associated type projection. This is written like so: (IntoIterator::Item)<T>.

You may note that it looks a lot like a regular type (e.g., Option<T>), except that the "name" of the type is (IntoIterator::Item). This is not an accident: placeholder associated type projections work just like ordinary types like Vec<T> when it comes to unification. That is, they are only considered equal if (a) they are both references to the same associated type, like IntoIterator::Item and (b) their type arguments are equal.

Placeholder associated types are never written directly by the user. They are used internally by the trait system only, as we will see shortly.

In rustc, they correspond to the TyKind::UnnormalizedProjectionTy enum variant, declared in librustc/ty/sty.rs. In chalk, we use an ApplicationTy with a name living in a special namespace dedicated to placeholder associated types (see the TypeName enum declared in chalk-ir/src/lib.rs).

Projection equality

So far we have seen two ways to answer the question of "When can we consider an associated type projection equal to another type?":

  • the Normalize predicate could be used to transform projections when we knew which impl applied;
  • placeholder associated types can be used when we don't. This is also known as lazy normalization.

We now introduce the ProjectionEq predicate to bring those two cases together. The ProjectionEq predicate looks like so:

ProjectionEq(<T as IntoIterator>::Item = U)

and we will see that it can be proven either via normalization or via the placeholder type. As part of lowering an associated type declaration from some trait, we create two program clauses for ProjectionEq:

forall<T, U> {
    ProjectionEq(<T as IntoIterator>::Item = U) :-
        Normalize(<T as IntoIterator>::Item -> U)
}

forall<T> {
    ProjectionEq(<T as IntoIterator>::Item = (IntoIterator::Item)<T>)
}

These are the only two ProjectionEq program clauses we ever make for any given associated item.

Integration with unification

Now we are ready to discuss how associated type equality integrates with unification. As described in the type inference section, unification is basically a procedure with a signature like this:

Unify(A, B) = Result<(Subgoals, RegionConstraints), NoSolution>

In other words, we try to unify two things A and B. That procedure might just fail, in which case we get back Err(NoSolution). This would happen, for example, if we tried to unify u32 and i32.

The key point is that, on success, unification can also give back to us a set of subgoals that still remain to be proven. (It can also give back region constraints, but those are not relevant here).

Whenever unification encounters a non-placeholder associated type projection P being equated with some other type T, it always succeeds, but it produces a subgoal ProjectionEq(P = T) that is propagated back up. Thus it falls to the ordinary workings of the trait system to process that constraint.

If we unify two projections P1 and P2, then unification produces a variable X and asks us to prove that ProjectionEq(P1 = X) and ProjectionEq(P2 = X). (That used to be needed in an older system to prevent cycles; I rather doubt it still is. -nmatsakis)

Implied Bounds

Implied bounds remove the need to repeat where clauses written on a type declaration or a trait declaration. For example, say we have the following type declaration:

struct HashSet<K: Hash> {
    ...
}

then everywhere we use HashSet<K> as an "input" type, that is appearing in the receiver type of an impl or in the arguments of a function, we don't want to have to repeat the where K: Hash bound, as in:

// I don't want to have to repeat `where K: Hash` here.
impl<K> HashSet<K> {
    ...
}

// Same here.
fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
    println!("inserting!");
    set.insert(item);
}

Note that in the loud_insert example, HashSet<K> is not the type of the set argument of loud_insert, it only appears in the argument type &mut HashSet<K>: we care about every type appearing in the function's header (the header is the signature without the return type), not only types of the function's arguments.

The rationale for applying implied bounds to input types is that, for example, in order to call the loud_insert function above, the programmer must have produced the type HashSet<K> already, hence the compiler already verified that HashSet<K> was well-formed, i.e. that K effectively implemented Hash, as in the following example:

fn main() {
    // I am producing a value of type `HashSet<i32>`.
    // If `i32` was not `Hash`, the compiler would report an error here.
    let set: HashSet<i32> = HashSet::new();
    loud_insert(&mut set, 5);
}

Hence, we don't want to repeat where clauses for input types because that would sort of duplicate the work of the programmer, having to verify that their types are well-formed both when calling the function and when using them in the arguments of their function. The same reasoning applies when using an impl.

Similarly, given the following trait declaration:

trait Copy where Self: Clone { // desugared version of `Copy: Clone`
    ...
}

then everywhere we bound over SomeType: Copy, we would like to be able to use the fact that SomeType: Clone without having to write it explicitly, as in:

fn loud_clone<T: Clone>(x: T) {
    println!("cloning!");
    x.clone();
}

fn fun_with_copy<T: Copy>(x: T) {
    println!("will clone a `Copy` type soon...");

    // I'm using `loud_clone<T: Clone>` with `T: Copy`, I know this
    // implies `T: Clone` so I don't want to have to write it explicitly.
    loud_clone(x);
}

The rationale for implied bounds for traits is that if a type implements Copy, that is, if there exists an impl Copy for that type, there ought to exist an impl Clone for that type, otherwise the compiler would have reported an error in the first place. So again, if we were forced to repeat the additionnal where SomeType: Clone everywhere whereas we already know that SomeType: Copy hold, we would kind of duplicate the verification work.

Implied bounds are not yet completely enforced in rustc, at the moment it only works for outlive requirements, super trait bounds, and bounds on associated types. The full RFC can be found here. We'll give here a brief view of how implied bounds work and why we chose to implement it that way. The complete set of lowering rules can be found in the corresponding chapter.

Implied bounds and lowering rules

Now we need to express implied bounds in terms of logical rules. We will start with exposing a naive way to do it. Suppose that we have the following traits:

trait Foo {
    ...
}

trait Bar where Self: Foo { } {
    ...
}

So we would like to say that if a type implements Bar, then necessarily it must also implement Foo. We might think that a clause like this would work:

forall<Type> {
    Implemented(Type: Foo) :- Implemented(Type: Bar).
}

Now suppose that we just write this impl:

struct X;

impl Bar for X { }

Clearly this should not be allowed: indeed, we wrote a Bar impl for X, but the Bar trait requires that we also implement Foo for X, which we never did. In terms of what the compiler does, this would look like this:

struct X;

impl Bar for X {
    // We are in a `Bar` impl for the type `X`.
    // There is a `where Self: Foo` bound on the `Bar` trait declaration.
    // Hence I need to prove that `X` also implements `Foo` for that impl
    // to be legal.
}

So the compiler would try to prove Implemented(X: Foo). Of course it will not find any impl Foo for X since we did not write any. However, it will see our implied bound clause:

forall<Type> {
    Implemented(Type: Foo) :- Implemented(Type: Bar).
}

so that it may be able to prove Implemented(X: Foo) if Implemented(X: Bar) holds. And it turns out that Implemented(X: Bar) does hold since we wrote a Bar impl for X! Hence the compiler will accept the Bar impl while it should not.

Implied bounds coming from the environment

So the naive approach does not work. What we need to do is to somehow decouple implied bounds from impls. Suppose we know that a type SomeType<...> implements Bar and we want to deduce that SomeType<...> must also implement Foo.

There are two possibilities: first, we have enough information about SomeType<...> to see that there exists a Bar impl in the program which covers SomeType<...>, for example a plain impl<...> Bar for SomeType<...>. Then if the compiler has done its job correctly, there must exist a Foo impl which covers SomeType<...>, e.g. another plain impl<...> Foo for SomeType<...>. In that case then, we can just use this impl and we do not need implied bounds at all.

Second possibility: we do not know enough about SomeType<...> in order to find a Bar impl which covers it, for example if SomeType<...> is just a type parameter in a function:

fn foo<T: Bar>() {
    // We'd like to deduce `Implemented(T: Foo)`.
}

That is, the information that T implements Bar here comes from the environment. The environment is the set of things that we assume to be true when we type check some Rust declaration. In that case, what we assume is that T: Bar. Then at that point, we might authorize ourselves to have some kind of "local" implied bound reasoning which would say Implemented(T: Foo) :- Implemented(T: Bar). This reasoning would only be done within our foo function in order to avoid the earlier problem where we had a global clause.

We can apply these local reasonings everywhere we can have an environment -- i.e. when we can write where clauses -- that is, inside impls, trait declarations, and type declarations.

Computing implied bounds with FromEnv

The previous subsection showed that it was only useful to compute implied bounds for facts coming from the environment. We talked about "local" rules, but there are multiple possible strategies to indeed implement the locality of implied bounds.

In rustc, the current strategy is to elaborate bounds: that is, each time we have a fact in the environment, we recursively derive all the other things that are implied by this fact until we reach a fixed point. For example, if we have the following declarations:

trait A { }
trait B where Self: A { }
trait C where Self: B { }

fn foo<T: C>() {
    ...
}

then inside the foo function, we start with an environment containing only Implemented(T: C). Then because of implied bounds for the C trait, we elaborate Implemented(T: B) and add it to our environment. Because of implied bounds for the B trait, we elaborate Implemented(T: A)and add it to our environment as well. We cannot elaborate anything else, so we conclude that our final environment consists of Implemented(T: A + B + C).

In the new-style trait system, we like to encode as many things as possible with logical rules. So rather than "elaborating", we have a set of global program clauses defined like so:

forall<T> { Implemented(T: A) :- FromEnv(T: A). }

forall<T> { Implemented(T: B) :- FromEnv(T: B). }
forall<T> { FromEnv(T: A) :- FromEnv(T: B). }

forall<T> { Implemented(T: C) :- FromEnv(T: C). }
forall<T> { FromEnv(T: C) :- FromEnv(T: C). }

So these clauses are defined globally (that is, they are available from everywhere in the program) but they cannot be used because the hypothesis is always of the form FromEnv(...) which is a bit special. Indeed, as indicated by the name, FromEnv(...) facts can only come from the environment. How it works is that in the foo function, instead of having an environment containing Implemented(T: C), we replace this environment with FromEnv(T: C). From here and thanks to the above clauses, we see that we are able to reach any of Implemented(T: A), Implemented(T: B) or Implemented(T: C), which is what we wanted.

Implied bounds and well-formedness checking

Implied bounds are tightly related with well-formedness checking. Well-formedness checking is the process of checking that the impls the programmer wrote are legal, what we referred to earlier as "the compiler doing its job correctly".

We already saw examples of illegal and legal impls:

trait Foo { }
trait Bar where Self: Foo { }

struct X;
struct Y;

impl Bar for X {
    // This impl is not legal: the `Bar` trait requires that we also
    // implement `Foo`, and we didn't.
}

impl Foo for Y {
    // This impl is legal: there is nothing to check as there are no where
    // clauses on the `Foo` trait.
}

impl Bar for Y {
    // This impl is legal: we have a `Foo` impl for `Y`.
}

We must define what "legal" and "illegal" mean. For this, we introduce another predicate: WellFormed(Type: Trait). We say that the trait reference Type: Trait is well-formed if Type meets the bounds written on the Trait declaration. For each impl we write, assuming that the where clauses declared on the impl hold, the compiler tries to prove that the corresponding trait reference is well-formed. The impl is legal if the compiler manages to do so.

Coming to the definition of WellFormed(Type: Trait), it would be tempting to define it as:

trait Trait where WC1, WC2, ..., WCn {
    ...
}
forall<Type> {
    WellFormed(Type: Trait) :- WC1 && WC2 && .. && WCn.
}

and indeed this was basically what was done in rustc until it was noticed that this mixed badly with implied bounds. The key thing is that implied bounds allows someone to derive all bounds implied by a fact in the environment, and this transitively as we've seen with the A + B + C traits example. However, the WellFormed predicate as defined above only checks that the direct superbounds hold. That is, if we come back to our A + B + C example:

trait A { }
// No where clauses, always well-formed.
// forall<Type> { WellFormed(Type: A). }

trait B where Self: A { }
// We only check the direct superbound `Self: A`.
// forall<Type> { WellFormed(Type: B) :- Implemented(Type: A). }

trait C where Self: B { }
// We only check the direct superbound `Self: B`. We do not check
// the `Self: A` implied bound  coming from the `Self: B` superbound.
// forall<Type> { WellFormed(Type: C) :- Implemented(Type: B). }

There is an asymmetry between the recursive power of implied bounds and the shallow checking of WellFormed. It turns out that this asymmetry can be exploited. Indeed, suppose that we define the following traits:

trait Partial where Self: Copy { }
// WellFormed(Self: Partial) :- Implemented(Self: Copy).

trait Complete where Self: Partial { }
// WellFormed(Self: Complete) :- Implemented(Self: Partial).

impl<T> Partial for T where T: Complete { }

impl<T> Complete for T { }

For the Partial impl, what the compiler must prove is:

forall<T> {
    if (T: Complete) { // assume that the where clauses hold
        WellFormed(T: Partial) // show that the trait reference is well-formed
    }
}

Proving WellFormed(T: Partial) amounts to proving Implemented(T: Copy). However, we have Implemented(T: Complete) in our environment: thanks to implied bounds, we can deduce Implemented(T: Partial). Using implied bounds one level deeper, we can deduce Implemented(T: Copy). Finally, the Partial impl is legal.

For the Complete impl, what the compiler must prove is:

forall<T> {
    WellFormed(T: Complete) // show that the trait reference is well-formed
}

Proving WellFormed(T: Complete) amounts to proving Implemented(T: Partial). We see that the impl Partial for T applies if we can prove Implemented(T: Complete), and it turns out we can prove this fact since our impl<T> Complete for T is a blanket impl without any where clauses.

So both impls are legal and the compiler accepts the program. Moreover, thanks to the Complete blanket impl, all types implement Complete. So we could now use this impl like so:

fn eat<T>(x: T) { }

fn copy_everything<T: Complete>(x: T) {
    eat(x);
    eat(x);
}

fn main() {
    let not_copiable = vec![1, 2, 3, 4];
    copy_everything(not_copiable);
}

In this program, we use the fact that Vec<i32> implements Complete, as any other type. Hence we can call copy_everything with an argument of type Vec<i32>. Inside the copy_everything function, we have the Implemented(T: Complete) bound in our environment. Thanks to implied bounds, we can deduce Implemented(T: Partial). Using implied bounds again, we deduce Implemented(T: Copy) and we can indeed call the eat function which moves the argument twice since its argument is Copy. Problem: the T type was in fact Vec<i32> which is not copy at all, hence we will double-free the underlying vec storage so we have a memory unsoundness in safe Rust.

Of course, disregarding the asymmetry between WellFormed and implied bounds, this bug was possible only because we had some kind of self-referencing impls. But self-referencing impls are very useful in practice and are not the real culprits in this affair.

Co-inductiveness of WellFormed

So the solution is to fix this asymmetry between WellFormed and implied bounds. For that, we need for the WellFormed predicate to not only require that the direct superbounds hold, but also all the bounds transitively implied by the superbounds. What we can do is to have the following rules for the WellFormed predicate:

trait A { }
// WellFormed(Self: A) :- Implemented(Self: A).

trait B where Self: A { }
// WellFormed(Self: B) :- Implemented(Self: B) && WellFormed(Self: A).

trait C where Self: B { }
// WellFormed(Self: C) :- Implemented(Self: C) && WellFormed(Self: B).

Notice that we are now also requiring Implemented(Self: Trait) for WellFormed(Self: Trait) to be true: this is to simplify the process of traversing all the implied bounds transitively. This does not change anything when checking whether impls are legal, because since we assume that the where clauses hold inside the impl, we know that the corresponding trait reference do hold. Thanks to this setup, you can see that we indeed require to prove the set of all bounds transitively implied by the where clauses.

However there is still a catch. Suppose that we have the following trait definition:

trait Foo where <Self as Foo>::Item: Foo {
    type Item;
}

so this definition is a bit more involved than the ones we've seen already because it defines an associated item. However, the well-formedness rule would not be more complicated:

WellFormed(Self: Foo) :-
    Implemented(Self: Foo) &&
    WellFormed(<Self as Foo>::Item: Foo).

Now we would like to write the following impl:

impl Foo for i32 {
    type Item = i32;
}

The Foo trait definition and the impl Foo for i32 are perfectly valid Rust: we're kind of recursively using our Foo impl in order to show that the associated value indeed implements Foo, but that's ok. But if we translate this to our well-formedness setting, the compiler proof process inside the Foo impl is the following: it starts with proving that the well-formedness goal WellFormed(i32: Foo) is true. In order to do that, it must prove the following goals: Implemented(i32: Foo) and WellFormed(<i32 as Foo>::Item: Foo). Implemented(i32: Foo) holds because there is our impl and there are no where clauses on it so it's always true. However, because of the associated type value we used, WellFormed(<i32 as Foo>::Item: Foo) simplifies to just WellFormed(i32: Foo). So in order to prove its original goal WellFormed(i32: Foo), the compiler needs to prove WellFormed(i32: Foo): this clearly is a cycle and cycles are usually rejected by the trait solver, unless... if the WellFormed predicate was made to be co-inductive.

A co-inductive predicate, as discussed in the chapter on goals and clauses, are predicates for which the trait solver accepts cycles. In our setting, this would be a valid thing to do: indeed, the WellFormed predicate just serves as a way of enumerating all the implied bounds. Hence, it's like a fixed point algorithm: it tries to grow the set of implied bounds until there is nothing more to add. Here, a cycle in the chain of WellFormed predicates just means that there is no more bounds to add in that direction, so we can just accept this cycle and focus on other directions. It's easy to prove that under these co-inductive semantics, we are effectively visiting all the transitive implied bounds, and only these.

Implied bounds on types

We mainly talked about implied bounds for traits because this was the most subtle regarding implementation. Implied bounds on types are simpler, especially because if we assume that a type is well-formed, we don't use that fact to deduce that other types are well-formed, we only use it to deduce that e.g. some trait bounds hold.

For types, we just use rules like these ones:

struct Type<...> where WC1, ..., WCn {
    ...
}
forall<...> {
    WellFormed(Type<...>) :- WC1, ..., WCn.
}

forall<...> {
    FromEnv(WC1) :- FromEnv(Type<...>).
    ...
    FromEnv(WCn) :- FromEnv(Type<...>).
}

We can see that we have this asymmetry between well-formedness check, which only verifies that the direct superbounds hold, and implied bounds which gives access to all bounds transitively implied by the where clauses. In that case this is ok because as we said, we don't use FromEnv(Type<...>) to deduce other FromEnv(OtherType<...>) things, nor do we use FromEnv(Type: Trait) to deduce FromEnv(OtherType<...>) things. So in that sense type definitions are "less recursive" than traits, and we saw in a previous subsection that it was the combination of asymmetry and recursive trait / impls that led to unsoundness. As such, the WellFormed(Type<...>) predicate does not need to be co-inductive.

This asymmetry optimization is useful because in a real Rust program, we have to check the well-formedness of types very often (e.g. for each type which appears in the body of a function).

Region constraints

To be written.

Chalk does not have the concept of region constraints, and as of this writing, work on rustc was not far enough to worry about them.

In the meantime, you can read about region constraints in the type inference section.

The lowering module in rustc

The program clauses described in the lowering rules section are actually created in the rustc_traits::lowering module.

The program_clauses_for query

The main entry point is the program_clauses_for query, which – given a def-id – produces a set of Chalk program clauses. These queries are tested using a dedicated unit-testing mechanism, described below. The query is invoked on a DefId that identifies something like a trait, an impl, or an associated item definition. It then produces and returns a vector of program clauses.

Unit tests

Unit tests are located in src/test/ui/chalkify. A good example test is the lower_impl test. At the time of this writing, it looked like this:

#![feature(rustc_attrs)]

trait Foo { }

#[rustc_dump_program_clauses] //~ ERROR Implemented(T: Foo) :-
impl<T: 'static> Foo for T where T: Iterator<Item = i32> { }

fn main() {
    println!("hello");
}

The #[rustc_dump_program_clauses] annotation can be attached to anything with a def-id. (It requires the rustc_attrs feature.) The compiler will then invoke the program_clauses_for query on that item, and emit compiler errors that dump the clauses produced. These errors just exist for unit-testing, as we can then leverage the standard ui test mechanisms to check them. In this case, there is a //~ ERROR Implemented annotation which is intentionally minimal (it need only be a prefix of the error), but the stderr file contains the full details:

error: Implemented(T: Foo) :- ProjectionEq(<T as std::iter::Iterator>::Item == i32), TypeOutlives(T \
: 'static), Implemented(T: std::iter::Iterator), Implemented(T: std::marker::Sized).
  --> $DIR/lower_impl.rs:15:1
   |
LL | #[rustc_dump_program_clauses] //~ ERROR Implemented(T: Foo) :-
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

Lowering rules

This section gives the complete lowering rules for Rust traits into program clauses. It is a kind of reference. These rules reference the domain goals defined in an earlier section.

Notation

The nonterminal Pi is used to mean some generic parameter, either a named lifetime like 'a or a type parameter like A.

The nonterminal Ai is used to mean some generic argument, which might be a lifetime like 'a or a type like Vec<A>.

When defining the lowering rules, we will give goals and clauses in the notation given in this section. We sometimes insert "macros" like LowerWhereClause! into these definitions; these macros reference other sections within this chapter.

Rule names and cross-references

Each of these lowering rules is given a name, documented with a comment like so:

// Rule Foo-Bar-Baz

The reference implementation of these rules is to be found in chalk/src/rules.rs. They are also ported in rustc in the librustc_traits crate.

Lowering where clauses

When used in a goal position, where clauses can be mapped directly to the Holds variant of domain goals, as follows:

  • A0: Foo<A1..An> maps to Implemented(A0: Foo<A1..An>)
  • T: 'r maps to Outlives(T, 'r)
  • 'a: 'b maps to Outlives('a, 'b)
  • A0: Foo<A1..An, Item = T> is a bit special and expands to two distinct goals, namely Implemented(A0: Foo<A1..An>) and ProjectionEq(<A0 as Foo<A1..An>>::Item = T)

In the rules below, we will use WC to indicate where clauses that appear in Rust syntax; we will then use the same WC to indicate where those where clauses appear as goals in the program clauses that we are producing. In that case, the mapping above is used to convert from the Rust syntax into goals.

Transforming the lowered where clauses

In addition, in the rules below, we sometimes do some transformations on the lowered where clauses, as defined here:

  • FromEnv(WC) – this indicates that:
    • Implemented(TraitRef) becomes FromEnv(TraitRef)
    • other where-clauses are left intact
  • WellFormed(WC) – this indicates that:
    • Implemented(TraitRef) becomes WellFormed(TraitRef)
    • other where-clauses are left intact

TODO: I suspect that we want to alter the outlives relations too, but Chalk isn't modeling those right now.

Lowering traits

Given a trait definition

trait Trait<P1..Pn> // P0 == Self
where WC
{
    // trait items
}

we will produce a number of declarations. This section is focused on the program clauses for the trait header (i.e., the stuff outside the {}); the section on trait items covers the stuff inside the {}.

Trait header

From the trait itself we mostly make "meta" rules that setup the relationships between different kinds of domain goals. The first such rule from the trait header creates the mapping between the FromEnv and Implemented predicates:

// Rule Implemented-From-Env
forall<Self, P1..Pn> {
  Implemented(Self: Trait<P1..Pn>) :- FromEnv(Self: Trait<P1..Pn>)
}

Implied bounds

The next few clauses have to do with implied bounds (see also RFC 2089 and the implied bounds chapter for a more in depth cover). For each trait, we produce two clauses:

// Rule Implied-Bound-From-Trait
//
// For each where clause WC:
forall<Self, P1..Pn> {
  FromEnv(WC) :- FromEnv(Self: Trait<P1..Pn)
}

This clause says that if we are assuming that the trait holds, then we can also assume that its where-clauses hold. It's perhaps useful to see an example:

trait Eq: PartialEq { ... }

In this case, the PartialEq supertrait is equivalent to a where Self: PartialEq where clause, in our simplified model. The program clause above therefore states that if we can prove FromEnv(T: Eq) – e.g., if we are in some function with T: Eq in its where clauses – then we also know that FromEnv(T: PartialEq). Thus the set of things that follow from the environment are not only the direct where clauses but also things that follow from them.

The next rule is related; it defines what it means for a trait reference to be well-formed:

// Rule WellFormed-TraitRef
forall<Self, P1..Pn> {
  WellFormed(Self: Trait<P1..Pn>) :- Implemented(Self: Trait<P1..Pn>) && WellFormed(WC)
}

This WellFormed rule states that T: Trait is well-formed if (a) T: Trait is implemented and (b) all the where-clauses declared on Trait are well-formed (and hence they are implemented). Remember that the WellFormed predicate is coinductive; in this case, it is serving as a kind of "carrier" that allows us to enumerate all the where clauses that are transitively implied by T: Trait.

An example:

trait Foo: A + Bar { }
trait Bar: B + Foo { }
trait A { }
trait B { }

Here, the transitive set of implications for T: Foo are T: A, T: Bar, and T: B. And indeed if we were to try to prove WellFormed(T: Foo), we would have to prove each one of those:

  • WellFormed(T: Foo)
    • Implemented(T: Foo)
    • WellFormed(T: A)
      • Implemented(T: A)
    • WellFormed(T: Bar)
      • Implemented(T: Bar)
      • WellFormed(T: B)
        • Implemented(T: Bar)
      • WellFormed(T: Foo) -- cycle, true coinductively

This WellFormed predicate is only used when proving that impls are well-formed – basically, for each impl of some trait ref TraitRef, we must show that WellFormed(TraitRef). This in turn justifies the implied bounds rules that allow us to extend the set of FromEnv items.

Lowering type definitions

We also want to have some rules which define when a type is well-formed. For example, given this type:

struct Set<K> where K: Hash { ... }

then Set<i32> is well-formed because i32 implements Hash, but Set<NotHash> would not be well-formed. Basically, a type is well-formed if its parameters verify the where clauses written on the type definition.

Hence, for every type definition:

struct Type<P1..Pn> where WC { ... }

we produce the following rule:

// Rule WellFormed-Type
forall<P1..Pn> {
  WellFormed(Type<P1..Pn>) :- WC
}

Note that we use struct for defining a type, but this should be understood as a general type definition (it could be e.g. a generic enum).

Conversely, we define rules which say that if we assume that a type is well-formed, we can also assume that its where clauses hold. That is, we produce the following family of rules:

// Rule Implied-Bound-From-Type
//
// For each where clause `WC`
forall<P1..Pn> {
  FromEnv(WC) :- FromEnv(Type<P1..Pn>)
}

As for the implied bounds RFC, functions will assume that their arguments are well-formed. For example, suppose we have the following bit of code:

trait Hash: Eq { }
struct Set<K: Hash> { ... }

fn foo<K>(collection: Set<K>, x: K, y: K) {
    // `x` and `y` can be equalized even if we did not explicitly write
    // `where K: Eq`
    if x == y {
        ...
    }
}

In the foo function, we assume that Set<K> is well-formed, i.e. we have FromEnv(Set<K>) in our environment. Because of the previous rule, we get FromEnv(K: Hash) without needing an explicit where clause. And because of the Hash trait definition, there also exists a rule which says:

forall<K> {
  FromEnv(K: Eq) :- FromEnv(K: Hash)
}

which means that we finally get FromEnv(K: Eq) and then can compare x and y without needing an explicit where clause.

Lowering trait items

Associated type declarations

Given a trait that declares a (possibly generic) associated type:

trait Trait<P1..Pn> // P0 == Self
where WC
{
    type AssocType<Pn+1..Pm>: Bounds where WC1;
}

We will produce a number of program clauses. The first two define the rules by which ProjectionEq can succeed; these two clauses are discussed in detail in the section on associated types, but reproduced here for reference:

// Rule ProjectionEq-Normalize
//
// ProjectionEq can succeed by normalizing:
forall<Self, P1..Pn, Pn+1..Pm, U> {
  ProjectionEq(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> = U) :-
      Normalize(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> U)
}
// Rule ProjectionEq-Placeholder
//
// ProjectionEq can succeed through the placeholder associated type,
// see "associated type" chapter for more:
forall<Self, P1..Pn, Pn+1..Pm> {
  ProjectionEq(
    <Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm> =
    (Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>
  )
}

The next rule covers implied bounds for the projection. In particular, the Bounds declared on the associated type must have been proven to hold to show that the impl is well-formed, and hence we can rely on them elsewhere.

// Rule Implied-Bound-From-AssocTy
//
// For each `Bound` in `Bounds`:
forall<Self, P1..Pn, Pn+1..Pm> {
    FromEnv(<Self as Trait<P1..Pn>>::AssocType<Pn+1..Pm>>: Bound) :-
      FromEnv(Self: Trait<P1..Pn>) && WC1
}

Next, we define the requirements for an instantiation of our associated type to be well-formed...

// Rule WellFormed-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
    WellFormed((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>) :-
      Implemented(Self: Trait<P1..Pn>) && WC1
}

...along with the reverse implications, when we can assume that it is well-formed.

// Rule Implied-WC-From-AssocTy
//
// For each where clause WC1:
forall<Self, P1..Pn, Pn+1..Pm> {
    FromEnv(WC1) :- FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}
// Rule Implied-Trait-From-AssocTy
forall<Self, P1..Pn, Pn+1..Pm> {
    FromEnv(Self: Trait<P1..Pn>) :-
      FromEnv((Trait::AssocType)<Self, P1..Pn, Pn+1..Pm>)
}

Lowering function and constant declarations

Chalk didn't model functions and constants, but I would eventually like to treat them exactly like normalization. See the section on function/constant values below for more details.

Lowering impls

Given an impl of a trait:

impl<P0..Pn> Trait<A1..An> for A0
where WC
{
    // zero or more impl items
}

Let TraitRef be the trait reference A0: Trait<A1..An>. Then we will create the following rules:

// Rule Implemented-From-Impl
forall<P0..Pn> {
  Implemented(TraitRef) :- WC
}

In addition, we will lower all of the impl items.

Lowering impl items

Associated type values

Given an impl that contains:

impl<P0..Pn> Trait<P1..Pn> for P0
where WC_impl
{
    type AssocType<Pn+1..Pm> = T;
}

and our where clause WC1 on the trait associated type from above, we produce the following rule:

// Rule Normalize-From-Impl
forall<P0..Pm> {
  forall<Pn+1..Pm> {
    Normalize(<P0 as Trait<P1..Pn>>::AssocType<Pn+1..Pm> -> T) :-
      Implemented(P0 as Trait) && WC1
  }
}

Note that WC_impl and WC1 both encode where-clauses that the impl can rely on. (WC_impl is not used here, because it is implied by Implemented(P0 as Trait).)

Function and constant values

Chalk didn't model functions and constants, but I would eventually like to treat them exactly like normalization. This presumably involves adding a new kind of parameter (constant), and then having a NormalizeValue domain goal. This is to be written because the details are a bit up in the air.

Well-formedness checking

WF checking has the job of checking that the various declarations in a Rust program are well-formed. This is the basis for implied bounds, and partly for that reason, this checking can be surprisingly subtle! For example, we have to be sure that each impl proves the WF conditions declared on the trait.

For each declaration in a Rust program, we will generate a logical goal and try to prove it using the lowered rules we described in the lowering rules chapter. If we are able to prove it, we say that the construct is well-formed. If not, we report an error to the user.

Well-formedness checking happens in the src/rules/wf.rs module in chalk. After you have read this chapter, you may find useful to see an extended set of examples in the src/rules/wf/test.rs submodule.

The new-style WF checking has not been implemented in rustc yet.

We give here a complete reference of the generated goals for each Rust declaration.

In addition to the notations introduced in the chapter about lowering rules, we'll introduce another notation: when checking WF of a declaration, we'll often have to prove that all types that appear are well-formed, except type parameters that we always assume to be WF. Hence, we'll use the following notation: for a type SomeType<...>, we define InputTypes(SomeType<...>) to be the set of all non-parameter types appearing in SomeType<...>, including SomeType<...> itself.

Examples:

  • InputTypes((u32, f32)) = [u32, f32, (u32, f32)]
  • InputTypes(Box<T>) = [Box<T>] (assuming that T is a type parameter)
  • InputTypes(Box<Box<T>>) = [Box<T>, Box<Box<T>>]

We also extend the InputTypes notation to where clauses in the natural way. So, for example InputTypes(A0: Trait<A1,...,An>) is the union of InputTypes(A0), InputTypes(A1), ..., InputTypes(An).

Type definitions

Given a general type definition:

struct Type<P...> where WC_type {
    field1: A1,
    ...
    fieldn: An,
}

we generate the following goal, which represents its well-formedness condition:

forall<P...> {
    if (FromEnv(WC_type)) {
        WellFormed(InputTypes(WC_type)) &&
            WellFormed(InputTypes(A1)) &&
            ...
            WellFormed(InputTypes(An))
    }
}

which in English states: assuming that the where clauses defined on the type hold, prove that every type appearing in the type definition is well-formed.

Some examples:

struct OnlyClone<T> where T: Clone {
    clonable: T,
}
// The only types appearing are type parameters: we have nothing to check,
// the type definition is well-formed.
struct Foo<T> where T: Clone {
    foo: OnlyClone<T>,
}
// The only non-parameter type which appears in this definition is
// `OnlyClone<T>`. The generated goal is the following:
// ```
// forall<T> {
//     if (FromEnv(T: Clone)) {
//          WellFormed(OnlyClone<T>)
//     }
// }
// ```
// which is provable.
struct Bar<T> where <T as Iterator>::Item: Debug {
    bar: i32,
}
// The only non-parameter types which appear in this definition are
// `<T as Iterator>::Item` and `i32`. The generated goal is the following:
// ```
// forall<T> {
//     if (FromEnv(<T as Iterator>::Item: Debug)) {
//          WellFormed(<T as Iterator>::Item) &&
//               WellFormed(i32)
//     }
// }
// ```
// which is not provable since `WellFormed(<T as Iterator>::Item)` requires
// proving `Implemented(T: Iterator)`, and we are unable to prove that for an
// unknown `T`.
//
// Hence, this type definition is considered illegal. An additional
// `where T: Iterator` would make it legal.

Trait definitions

Given a general trait definition:

trait Trait<P1...> where WC_trait {
    type Assoc<P2...>: Bounds_assoc where WC_assoc;
}

we generate the following goal:

forall<P1...> {
    if (FromEnv(WC_trait)) {
        WellFormed(InputTypes(WC_trait)) &&

            forall<P2...> {
                if (FromEnv(WC_assoc)) {
                    WellFormed(InputTypes(Bounds_assoc)) &&
                        WellFormed(InputTypes(WC_assoc))
                }
            }
    }
}

There is not much to verify in a trait definition. We just want to prove that the types appearing in the trait definition are well-formed, under the assumption that the different where clauses hold.

Some examples:

trait Foo<T> where T: Iterator, <T as Iterator>::Item: Debug {
    ...
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
//     if (FromEnv(T: Iterator), FromEnv(<T as Iterator>::Item: Debug)) {
//         WellFormed(<T as Iterator>::Item)
//     }
// }
// ```
// which is provable thanks to the `FromEnv(T: Iterator)` assumption.
trait Bar {
    type Assoc<T>: From<<T as Iterator>::Item>;
}
// The only non-parameter type which appears in this definition is
// `<T as Iterator>::Item`. The generated goal is the following:
// ```
// forall<T> {
//     WellFormed(<T as Iterator>::Item)
// }
// ```
// which is not provable, hence the trait definition is considered illegal.
trait Baz {
    type Assoc<T>: From<<T as Iterator>::Item> where T: Iterator;
}
// The generated goal is now:
// ```
// forall<T> {
//     if (FromEnv(T: Iterator)) {
//         WellFormed(<T as Iterator>::Item)
//     }
// }
// ```
// which is now provable.

Impls

Now we give ourselves a general impl for the trait defined above:

impl<P1...> Trait<A1...> for SomeType<A2...> where WC_impl {
    type Assoc<P2...> = SomeValue<A3...> where WC_assoc;
}

Note that here, WC_assoc are the same where clauses as those defined on the associated type definition in the trait declaration, except that type parameters from the trait are substituted with values provided by the impl (see example below). You cannot add new where clauses. You may omit to write the where clauses if you want to emphasize the fact that you are actually not relying on them.

Some examples to illustrate that:

trait Foo<T> {
    type Assoc where T: Clone;
}

struct OnlyClone<T: Clone> { ... }

impl<U> Foo<Option<U>> for () {
    // We substitute type parameters from the trait by the ones provided
    // by the impl, that is instead of having a `T: Clone` where clause,
    // we have an `Option<U>: Clone` one.
    type Assoc = OnlyClone<Option<U>> where Option<U>: Clone;
}

impl<T> Foo<T> for i32 {
    // I'm not using the `T: Clone` where clause from the trait, so I can
    // omit it.
    type Assoc = u32;
}

impl<T> Foo<T> for f32 {
    type Assoc = OnlyClone<Option<T>> where Option<T>: Clone;
    //                                ^^^^^^^^^^^^^^^^^^^^^^
    //                                this where clause does not exist
    //                                on the original trait decl: illegal
}

So in Rust, where clauses on associated types work exactly like where clauses on trait methods: in an impl, we must substitute the parameters from the traits with values provided by the impl, we may omit them if we don't need them, but we cannot add new where clauses.

Now let's see the generated goal for this general impl:

forall<P1...> {
    // Well-formedness of types appearing in the impl
    if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
        WellFormed(InputTypes(WC_impl)) &&

            forall<P2...> {
                if (FromEnv(WC_assoc)) {
                        WellFormed(InputTypes(SomeValue<A3...>))
                }
            }
    }

    // Implied bounds checking
    if (FromEnv(WC_impl), FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) {
        WellFormed(SomeType<A2...>: Trait<A1...>) &&

            forall<P2...> {
                if (FromEnv(WC_assoc)) {
                    WellFormed(SomeValue<A3...>: Bounds_assoc)
                }
            }
    }
}

Here is the most complex goal. As always, first, assuming that the various where clauses hold, we prove that every type appearing in the impl is well-formed, except types appearing in the impl header SomeType<A2...>: Trait<A1...>. Instead, we assume that those types are well-formed (hence the if (FromEnv(InputTypes(SomeType<A2...>: Trait<A1...>))) conditions). This is part of the implied bounds proposal, so that we can rely on the bounds written on the definition of e.g. the SomeType<A2...> type (and that we don't need to repeat those bounds).

Note that we don't need to check well-formedness of types appearing in WC_assoc because we already did that in the trait decl (they are just repeated with some substitutions of values which we already assume to be well-formed)

Next, still assuming that the where clauses on the impl WC_impl hold and that the input types of SomeType<A2...> are well-formed, we prove that WellFormed(SomeType<A2...>: Trait<A1...>) hold. That is, we want to prove that SomeType<A2...> verify all the where clauses that might transitively be required by the Trait definition (see this subsection).

Lastly, assuming in addition that the where clauses on the associated type WC_assoc hold, we prove that WellFormed(SomeValue<A3...>: Bounds_assoc) hold. Again, we are not only proving Implemented(SomeValue<A3...>: Bounds_assoc), but also all the facts that might transitively come from Bounds_assoc. We must do this because we allow the use of implied bounds on associated types: if we have FromEnv(SomeType: Trait) in our environment, the lowering rules chapter indicates that we are able to deduce FromEnv(<SomeType as Trait>::Assoc: Bounds_assoc) without knowing what the precise value of <SomeType as Trait>::Assoc is.

Some examples for the generated goal:

// Trait Program Clauses

// These are program clauses that come from the trait definitions below
// and that the trait solver can use for its reasonings. I'm just restating
// them here so that we have them in mind.

trait Copy { }
// This is a program clause that comes from the trait definition above
// and that the trait solver can use for its reasonings. I'm just restating
// it here (and also the few other ones coming just after) so that we have
// them in mind.
// `WellFormed(Self: Copy) :- Implemented(Self: Copy).`

trait Partial where Self: Copy { }
// ```
// WellFormed(Self: Partial) :-
//     Implemented(Self: Partial) &&
//     WellFormed(Self: Copy).
// ```

trait Complete where Self: Partial { }
// ```
// WellFormed(Self: Complete) :-
//     Implemented(Self: Complete) &&
//     WellFormed(Self: Partial).
// ```

// Impl WF Goals

impl<T> Partial for T where T: Complete { }
// The generated goal is:
// ```
// forall<T> {
//     if (FromEnv(T: Complete)) {
//         WellFormed(T: Partial)
//     }
// }
// ```
// Then proving `WellFormed(T: Partial)` amounts to proving
// `Implemented(T: Partial)` and `Implemented(T: Copy)`.
// Both those facts can be deduced from the `FromEnv(T: Complete)` in our
// environment: this impl is legal.

impl<T> Complete for T { }
// The generated goal is:
// ```
// forall<T> {
//     WellFormed(T: Complete)
// }
// ```
// Then proving `WellFormed(T: Complete)` amounts to proving
// `Implemented(T: Complete)`, `Implemented(T: Partial)` and
// `Implemented(T: Copy)`.
//
// `Implemented(T: Complete)` can be proved thanks to the
// `impl<T> Complete for T` blanket impl.
//
// `Implemented(T: Partial)` can be proved thanks to the
// `impl<T> Partial for T where T: Complete` impl and because we know
// `T: Complete` holds.

// However, `Implemented(T: Copy)` cannot be proved: the impl is illegal.
// An additional `where T: Copy` bound would be sufficient to make that impl
// legal.
trait Bar { }

impl<T> Bar for T where <T as Iterator>::Item: Bar { }
// We have a non-parameter type appearing in the where clauses:
// `<T as Iterator>::Item`. The generated goal is:
// ```
// forall<T> {
//     if (FromEnv(<T as Iterator>::Item: Bar)) {
//         WellFormed(T: Bar) &&
//             WellFormed(<T as Iterator>::Item: Bar)
//     }
// }
// ```
// And `WellFormed(<T as Iterator>::Item: Bar)` is not provable: we'd need
// an additional `where T: Iterator` for example.
trait Foo { }

trait Bar {
    type Item: Foo;
}

struct Stuff<T> { }

impl<T> Bar for Stuff<T> where T: Foo {
    type Item = T;
}
// The generated goal is:
// ```
// forall<T> {
//     if (FromEnv(T: Foo)) {
//         WellFormed(T: Foo).
//     }
// }
// ```
// which is provable.
trait Debug { ... }
// `WellFormed(Self: Debug) :- Implemented(Self: Debug).`

struct Box<T> { ... }
impl<T> Debug for Box<T> where T: Debug { ... }

trait PointerFamily {
    type Pointer<T>: Debug where T: Debug;
}
// `WellFormed(Self: PointerFamily) :- Implemented(Self: PointerFamily).`

struct BoxFamily;

impl PointerFamily for BoxFamily {
    type Pointer<T> = Box<T> where T: Debug;
}
// The generated goal is:
// ```
// forall<T> {
//     WellFormed(BoxFamily: PointerFamily) &&
//
//     if (FromEnv(T: Debug)) {
//         WellFormed(Box<T>: Debug) &&
//             WellFormed(Box<T>)
//     }
// }
// ```
// `WellFormed(BoxFamily: PointerFamily)` amounts to proving
// `Implemented(BoxFamily: PointerFamily)`, which is ok thanks to our impl.
//
// `WellFormed(Box<T>)` is always true (there are no where clauses on the
// `Box` type definition).
//
// Moreover, we have an `impl<T: Debug> Debug for Box<T>`, hence
// we can prove `WellFormed(Box<T>: Debug)` and the impl is indeed legal.
trait Foo {
    type Assoc<T>;
}

struct OnlyClone<T: Clone> { ... }

impl Foo for i32 {
    type Assoc<T> = OnlyClone<T>;
}
// The generated goal is:
// ```
// forall<T> {
//     WellFormed(i32: Foo) &&
//        WellFormed(OnlyClone<T>)
// }
// ```
// however `WellFormed(OnlyClone<T>)` is not provable because it requires
// `Implemented(T: Clone)`. It would be tempting to just add a `where T: Clone`
// bound inside the `impl Foo for i32` block, however we saw that it was
// illegal to add where clauses that didn't come from the trait definition.

Canonical queries

The "start" of the trait system is the canonical query (these are both queries in the more general sense of the word – something you would like to know the answer to – and in the rustc-specific sense). The idea is that the type checker or other parts of the system, may in the course of doing their thing want to know whether some trait is implemented for some type (e.g., is u32: Debug true?). Or they may want to normalize some associated type.

This section covers queries at a fairly high level of abstraction. The subsections look a bit more closely at how these ideas are implemented in rustc.

The traditional, interactive Prolog query

In a traditional Prolog system, when you start a query, the solver will run off and start supplying you with every possible answer it can find. So given something like this:

?- Vec<i32>: AsRef<?U>

The solver might answer:

Vec<i32>: AsRef<[i32]>
    continue? (y/n)

This continue bit is interesting. The idea in Prolog is that the solver is finding all possible instantiations of your query that are true. In this case, if we instantiate ?U = [i32], then the query is true (note that a traditional Prolog interface does not, directly, tell us a value for ?U, but we can infer one by unifying the response with our original query – Rust's solver gives back a substitution instead). If we were to hit y, the solver might then give us another possible answer:

Vec<i32>: AsRef<Vec<i32>>
    continue? (y/n)

This answer derives from the fact that there is a reflexive impl (impl<T> AsRef<T> for T) for AsRef. If were to hit y again, then we might get back a negative response:

no

Naturally, in some cases, there may be no possible answers, and hence the solver will just give me back no right away:

?- Box<i32>: Copy
    no

In some cases, there might be an infinite number of responses. So for example if I gave this query, and I kept hitting y, then the solver would never stop giving me back answers:

?- Vec<?U>: Clone
    Vec<i32>: Clone
        continue? (y/n)
    Vec<Box<i32>>: Clone
        continue? (y/n)
    Vec<Box<Box<i32>>>: Clone
        continue? (y/n)
    Vec<Box<Box<Box<i32>>>>: Clone
        continue? (y/n)

As you can imagine, the solver will gleefully keep adding another layer of Box until we ask it to stop, or it runs out of memory.

Another interesting thing is that queries might still have variables in them. For example:

?- Rc<?T>: Clone

might produce the answer:

Rc<?T>: Clone
    continue? (y/n)

After all, Rc<?T> is true no matter what type ?T is.

A trait query in rustc

The trait queries in rustc work somewhat differently. Instead of trying to enumerate all possible answers for you, they are looking for an unambiguous answer. In particular, when they tell you the value for a type variable, that means that this is the only possible instantiation that you could use, given the current set of impls and where-clauses, that would be provable. (Internally within the solver, though, they can potentially enumerate all possible answers. See the description of the SLG solver for details.)

The response to a trait query in rustc is typically a Result<QueryResult<T>, NoSolution> (where the T will vary a bit depending on the query itself). The Err(NoSolution) case indicates that the query was false and had no answers (e.g., Box<i32>: Copy). Otherwise, the QueryResult gives back information about the possible answer(s) we did find. It consists of four parts:

  • Certainty: tells you how sure we are of this answer. It can have two values:
    • Proven means that the result is known to be true.
      • This might be the result for trying to prove Vec<i32>: Clone, say, or Rc<?T>: Clone.
    • Ambiguous means that there were things we could not yet prove to be either true or false, typically because more type information was needed. (We'll see an example shortly.)
      • This might be the result for trying to prove Vec<?T>: Clone.
  • Var values: Values for each of the unbound inference variables (like ?T) that appeared in your original query. (Remember that in Prolog, we had to infer these.)
    • As we'll see in the example below, we can get back var values even for Ambiguous cases.
  • Region constraints: these are relations that must hold between the lifetimes that you supplied as inputs. We'll ignore these here, but see the section on handling regions in traits for more details.
  • Value: The query result also comes with a value of type T. For some specialized queries – like normalizing associated types – this is used to carry back an extra result, but it's often just ().

Examples

Let's work through an example query to see what all the parts mean. Consider the Borrow trait. This trait has a number of impls; among them, there are these two (for clarity, I've written the Sized bounds explicitly):

impl<T> Borrow<T> for T where T: ?Sized
impl<T> Borrow<[T]> for Vec<T> where T: Sized

Example 1. Imagine we are type-checking this (rather artificial) bit of code:

fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }

fn main() {
    let mut t: Vec<_> = vec![]; // Type: Vec<?T>
    let mut u: Option<_> = None; // Type: Option<?U>
    foo(t, u); // Example 1: requires `Vec<?T>: Borrow<?U>`
    ...
}

As the comments indicate, we first create two variables t and u; t is an empty vector and u is a None option. Both of these variables have unbound inference variables in their type: ?T represents the elements in the vector t and ?U represents the value stored in the option u. Next, we invoke foo; comparing the signature of foo to its arguments, we wind up with A = Vec<?T> and B = ?U.Therefore, the where clause on foo requires that Vec<?T>: Borrow<?U>. This is thus our first example trait query.

There are many possible solutions to the query Vec<?T>: Borrow<?U>; for example:

  • ?U = Vec<?T>,
  • ?U = [?T],
  • ?T = u32, ?U = [u32]
  • and so forth.

Therefore, the result we get back would be as follows (I'm going to ignore region constraints and the "value"):

  • Certainty: Ambiguous – we're not sure yet if this holds
  • Var values: [?T = ?T, ?U = ?U] – we learned nothing about the values of the variables

In short, the query result says that it is too soon to say much about whether this trait is proven. During type-checking, this is not an immediate error: instead, the type checker would hold on to this requirement (Vec<?T>: Borrow<?U>) and wait. As we'll see in the next example, it may happen that ?T and ?U wind up constrained from other sources, in which case we can try the trait query again.

Example 2. We can now extend our previous example a bit, and assign a value to u:

fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }

fn main() {
    // What we saw before:
    let mut t: Vec<_> = vec![]; // Type: Vec<?T>
    let mut u: Option<_> = None; // Type: Option<?U>
    foo(t, u); // `Vec<?T>: Borrow<?U>` => ambiguous

    // New stuff:
    u = Some(vec![]); // ?U = Vec<?V>
}

As a result of this assignment, the type of u is forced to be Option<Vec<?V>>, where ?V represents the element type of the vector. This in turn implies that ?U is unified to Vec<?V>.

Let's suppose that the type checker decides to revisit the "as-yet-unproven" trait obligation we saw before, Vec<?T>: Borrow<?U>. ?U is no longer an unbound inference variable; it now has a value, Vec<?V>. So, if we "refresh" the query with that value, we get:

Vec<?T>: Borrow<Vec<?V>>

This time, there is only one impl that applies, the reflexive impl:

impl<T> Borrow<T> for T where T: ?Sized

Therefore, the trait checker will answer:

  • Certainty: Proven
  • Var values: [?T = ?T, ?V = ?T]

Here, it is saying that we have indeed proven that the obligation holds, and we also know that ?T and ?V are the same type (but we don't know what that type is yet!).

(In fact, as the function ends here, the type checker would give an error at this point, since the element types of t and u are still not yet known, even though they are known to be the same.)

Canonicalization

Canonicalization is the process of isolating an inference value from its context. It is a key part of implementing canonical queries, and you may wish to read the parent chapter to get more context.

Canonicalization is really based on a very simple concept: every inference variable is always in one of two states: either it is unbound, in which case we don't know yet what type it is, or it is bound, in which case we do. So to isolate some data-structure T that contains types/regions from its environment, we just walk down and find the unbound variables that appear in T; those variables get replaced with "canonical variables", starting from zero and numbered in a fixed order (left to right, for the most part, but really it doesn't matter as long as it is consistent).

So, for example, if we have the type X = (?T, ?U), where ?T and ?U are distinct, unbound inference variables, then the canonical form of X would be (?0, ?1), where ?0 and ?1 represent these canonical placeholders. Note that the type Y = (?U, ?T) also canonicalizes to (?0, ?1). But the type Z = (?T, ?T) would canonicalize to (?0, ?0) (as would (?U, ?U)). In other words, the exact identity of the inference variables is not important – unless they are repeated.

We use this to improve caching as well as to detect cycles and other things during trait resolution. Roughly speaking, the idea is that if two trait queries have the same canonical form, then they will get the same answer. That answer will be expressed in terms of the canonical variables (?0, ?1), which we can then map back to the original variables (?T, ?U).

Canonicalizing the query

To see how it works, imagine that we are asking to solve the following trait query: ?A: Foo<'static, ?B>, where ?A and ?B are unbound. This query contains two unbound variables, but it also contains the lifetime 'static. The trait system generally ignores all lifetimes and treats them equally, so when canonicalizing, we will also replace any free lifetime with a canonical variable (Note that 'static is actually a free lifetime variable here. We are not considering it in the typing context of the whole program but only in the context of this trait reference. Mathematically, we are not quantifying over the whole program, but only this obligation). Therefore, we get the following result:

?0: Foo<'?1, ?2>

Sometimes we write this differently, like so:

for<T,L,T> { ?0: Foo<'?1, ?2> }

This for<> gives some information about each of the canonical variables within. In this case, each T indicates a type variable, so ?0 and ?2 are types; the L indicates a lifetime variable, so ?1 is a lifetime. The canonicalize method also gives back a CanonicalVarValues array OV with the "original values" for each canonicalized variable:

[?A, 'static, ?B]

We'll need this vector OV later, when we process the query response.

Executing the query

Once we've constructed the canonical query, we can try to solve it. To do so, we will wind up creating a fresh inference context and instantiating the canonical query in that context. The idea is that we create a substitution S from the canonical form containing a fresh inference variable (of suitable kind) for each canonical variable. So, for our example query:

for<T,L,T> { ?0: Foo<'?1, ?2> }

the substitution S might be:

S = [?A, '?B, ?C]

We can then replace the bound canonical variables (?0, etc) with these inference variables, yielding the following fully instantiated query:

?A: Foo<'?B, ?C>

Remember that substitution S though! We're going to need it later.

OK, now that we have a fresh inference context and an instantiated query, we can go ahead and try to solve it. The trait solver itself is explained in more detail in another section, but suffice to say that it will compute a certainty value (Proven or Ambiguous) and have side-effects on the inference variables we've created. For example, if there were only one impl of Foo, like so:

impl<'a, X> Foo<'a, X> for Vec<X>
where X: 'a
{ ... }

then we might wind up with a certainty value of Proven, as well as creating fresh inference variables '?D and ?E (to represent the parameters on the impl) and unifying as follows:

  • '?B = '?D
  • ?A = Vec<?E>
  • ?C = ?E

We would also accumulate the region constraint ?E: '?D, due to the where clause.

In order to create our final query result, we have to "lift" these values out of the query's inference context and into something that can be reapplied in our original inference context. We do that by re-applying canonicalization, but to the query result.

Canonicalizing the query result

As discussed in the parent section, most trait queries wind up with a result that brings together a "certainty value" certainty, a result substitution var_values, and some region constraints. To create this, we wind up re-using the substitution S that we created when first instantiating our query. To refresh your memory, we had a query

for<T,L,T> { ?0: Foo<'?1, ?2> }

for which we made a substutition S:

S = [?A, '?B, ?C]

We then did some work which unified some of those variables with other things. If we "refresh" S with the latest results, we get:

S = [Vec<?E>, '?D, ?E]

These are precisely the new values for the three input variables from our original query. Note though that they include some new variables (like ?E). We can make those go away by canonicalizing again! We don't just canonicalize S, though, we canonicalize the whole query response QR:

QR = {
    certainty: Proven,             // or whatever
    var_values: [Vec<?E>, '?D, ?E] // this is S
    region_constraints: [?E: '?D], // from the impl
    value: (),                     // for our purposes, just (), but
                                   // in some cases this might have
                                   // a type or other info
}

The result would be as follows:

Canonical(QR) = for<T, L> {
    certainty: Proven,
    var_values: [Vec<?0>, '?1, ?2]
    region_constraints: [?2: '?1],
    value: (),
}

(One subtle point: when we canonicalize the query result, we do not use any special treatment for free lifetimes. Note that both references to '?D, for example, were converted into the same canonical variable (?1). This is in contrast to the original query, where we canonicalized every free lifetime into a fresh canonical variable.)

Now, this result must be reapplied in each context where needed.

Processing the canonicalized query result

In the previous section we produced a canonical query result. We now have to apply that result in our original context. If you recall, way back in the beginning, we were trying to prove this query:

?A: Foo<'static, ?B>

We canonicalized that into this:

for<T,L,T> { ?0: Foo<'?1, ?2> }

and now we got back a canonical response:

for<T, L> {
    certainty: Proven,
    var_values: [Vec<?0>, '?1, ?2]
    region_constraints: [?2: '?1],
    value: (),
}

We now want to apply that response to our context. Conceptually, how we do that is to (a) instantiate each of the canonical variables in the result with a fresh inference variable, (b) unify the values in the result with the original values, and then (c) record the region constraints for later. Doing step (a) would yield a result of

{
      certainty: Proven,
      var_values: [Vec<?C>, '?D, ?C]
                       ^^   ^^^ fresh inference variables
      region_constraints: [?C: '?D],
      value: (),
}

Step (b) would then unify:

?A with Vec<?C>
'static with '?D
?B with ?C

And finally the region constraint of ?C: 'static would be recorded for later verification.

(What we actually do is a mildly optimized variant of that: Rather than eagerly instantiating all of the canonical values in the result with variables, we instead walk the vector of values, looking for cases where the value is just a canonical variable. In our example, values[2] is ?C, so that means we can deduce that ?C := ?B and'?D := 'static`. This gives us a partial set of values. Anything for which we do not find a value, we create an inference variable.)

The On-Demand SLG solver

Given a set of program clauses (provided by our lowering rules) and a query, we need to return the result of the query and the value of any type variables we can determine. This is the job of the solver.

For example, exists<T> { Vec<T>: FromIterator<u32> } has one solution, so its result is Unique; substitution [?T := u32]. A solution also comes with a set of region constraints, which we'll ignore in this introduction.

Goals of the Solver

On demand

There are often many, or even infinitely many, solutions to a query. For example, say we want to prove that exists<T> { Vec<T>: Debug } for some type ?T. Our solver should be capable of yielding one answer at a time, say ?T = u32, then ?T = i32, and so on, rather than iterating over every type in the type system. If we need more answers, we can request more until we are done. This is similar to how Prolog works.

See also: The traditional, interactive Prolog query

Breadth-first

Vec<?T>: Debug is true if ?T: Debug. This leads to a cycle: [Vec<u32>, Vec<Vec<u32>>, Vec<Vec<Vec<u32>>>], and so on all implement Debug. Our solver ought to be breadth first and consider answers like [Vec<u32>: Debug, Vec<i32>: Debug, ...] before it recurses, or we may never find the answer we're looking for.

Cachable

To speed up compilation, we need to cache results, including partial results left over from past solver queries.

Description of how it works

The basis of the solver is the Forest type. A forest stores a collection of tables as well as a stack. Each table represents the stored results of a particular query that is being performed, as well as the various strands, which are basically suspended computations that may be used to find more answers. Tables are interdependent: solving one query may require solving others.

Walkthrough

Perhaps the easiest way to explain how the solver works is to walk through an example. Let's imagine that we have the following program:

trait Debug { }

struct u32 { }
impl Debug for u32 { }

struct Rc<T> { }
impl<T: Debug> Debug for Rc<T> { }

struct Vec<T> { }
impl<T: Debug> Debug for Vec<T> { }

Now imagine that we want to find answers for the query exists<T> { Rc<T>: Debug }. The first step would be to u-canonicalize this query; this is the act of giving canonical names to all the unbound inference variables based on the order of their left-most appearance, as well as canonicalizing the universes of any universally bound names (e.g., the T in forall<T> { ... }). In this case, there are no universally bound names, but the canonical form Q of the query might look something like:

Rc<?0>: Debug

where ?0 is a variable in the root universe U0. We would then go and look for a table with this canonical query as the key: since the forest is empty, this lookup will fail, and we will create a new table T0, corresponding to the u-canonical goal Q.

Ignoring negative reasoning and regions. To start, we'll ignore the possibility of negative goals like not { Foo }. We'll phase them in later, as they bring several complications.

Creating a table. When we first create a table, we also initialize it with a set of initial strands. A "strand" is kind of like a "thread" for the solver: it contains a particular way to produce an answer. The initial set of strands for a goal like Rc<?0>: Debug (i.e., a "domain goal") is determined by looking for clauses in the environment. In Rust, these clauses derive from impls, but also from where-clauses that are in scope. In the case of our example, there would be three clauses, each coming from the program. Using a Prolog-like notation, these look like:

(u32: Debug).
(Rc<T>: Debug) :- (T: Debug).
(Vec<T>: Debug) :- (T: Debug).

To create our initial strands, then, we will try to apply each of these clauses to our goal of Rc<?0>: Debug. The first and third clauses are inapplicable because u32 and Vec<?0> cannot be unified with Rc<?0>. The second clause, however, will work.

What is a strand? Let's talk a bit more about what a strand is. In the code, a strand is the combination of an inference table, an X-clause, and (possibly) a selected subgoal from that X-clause. But what is an X-clause (ExClause, in the code)? An X-clause pulls together a few things:

  • The current state of the goal we are trying to prove;
  • A set of subgoals that have yet to be proven;
  • There are also a few things we're ignoring for now:
    • delayed literals, region constraints

The general form of an X-clause is written much like a Prolog clause, but with somewhat different semantics. Since we're ignoring delayed literals and region constraints, an X-clause just looks like this:

G :- L

where G is a goal and L is a set of subgoals that must be proven. (The L stands for literal -- when we address negative reasoning, a literal will be either a positive or negative subgoal.) The idea is that if we are able to prove L then the goal G can be considered true.

In the case of our example, we would wind up creating one strand, with an X-clause like so:

(Rc<?T>: Debug) :- (?T: Debug)

Here, the ?T refers to one of the inference variables created in the inference table that accompanies the strand. (I'll use named variables to refer to inference variables, and numbered variables like ?0 to refer to variables in a canonicalized goal; in the code, however, they are both represented with an index.)

For each strand, we also optionally store a selected subgoal. This is the subgoal after the turnstile (:-) that we are currently trying to prove in this strand. Initally, when a strand is first created, there is no selected subgoal.

Activating a strand. Now that we have created the table T0 and initialized it with strands, we have to actually try and produce an answer. We do this by invoking the ensure_root_answer operation on the table: specifically, we say ensure_root_answer(T0, A0), meaning "ensure that there is a 0th answer A0 to query T0".

Remember that tables store not only strands, but also a vector of cached answers. The first thing that ensure_root_answer does is to check whether answer A0 is in this vector. If so, we can just return immediately. In this case, the vector will be empty, and hence that does not apply (this becomes important for cyclic checks later on).

When there is no cached answer, ensure_root_answer will try to produce one. It does this by selecting a strand from the set of active strands -- the strands are stored in a VecDeque and hence processed in a round-robin fashion. Right now, we have only one strand, storing the following X-clause with no selected subgoal:

(Rc<?T>: Debug) :- (?T: Debug)

When we activate the strand, we see that we have no selected subgoal, and so we first pick one of the subgoals to process. Here, there is only one (?T: Debug), so that becomes the selected subgoal, changing the state of the strand to:

(Rc<?T>: Debug) :- selected(?T: Debug, A0)

Here, we write selected(L, An) to indicate that (a) the literal L is the selected subgoal and (b) which answer An we are looking for. We start out looking for A0.

Processing the selected subgoal. Next, we have to try and find an answer to this selected goal. To do that, we will u-canonicalize it and try to find an associated table. In this case, the u-canonical form of the subgoal is ?0: Debug: we don't have a table yet for that, so we can create a new one, T1. As before, we'll initialize T1 with strands. In this case, there will be three strands, because all the program clauses are potentially applicable. Those three strands will be:

  • (u32: Debug) :-, derived from the program clause (u32: Debug)..
    • Note: This strand has no subgoals.
  • (Vec<?U>: Debug) :- (?U: Debug), derived from the Vec impl.
  • (Rc<?U>: Debug) :- (?U: Debug), derived from the Rc impl.

We can thus summarize the state of the whole forest at this point as follows:

Table T0 [Rc<?0>: Debug]
  Strands:
    (Rc<?T>: Debug) :- selected(?T: Debug, A0)
  
Table T1 [?0: Debug]
  Strands:
    (u32: Debug) :-
    (Vec<?U>: Debug) :- (?U: Debug)
    (Rc<?V>: Debug) :- (?V: Debug)

Delegation between tables. Now that the active strand from T0 has created the table T1, it can try to extract an answer. It does this via that same ensure_answer operation we saw before. In this case, the strand would invoke ensure_answer(T1, A0), since we will start with the first answer. This will cause T1 to activate its first strand, u32: Debug :-.

This strand is somewhat special: it has no subgoals at all. This means that the goal is proven. We can therefore add u32: Debug to the set of answers for our table, calling it answer A0 (it is the first answer). The strand is then removed from the list of strands.

The state of table T1 is therefore:

Table T1 [?0: Debug]
  Answers:
    A0 = [?0 = u32]
  Strand:
    (Vec<?U>: Debug) :- (?U: Debug)
    (Rc<?V>: Debug) :- (?V: Debug)

Note that I am writing out the answer A0 as a substitution that can be applied to the table goal; actually, in the code, the goals for each X-clause are also represented as substitutions, but in this exposition I've chosen to write them as full goals, following NFTD.

Since we now have an answer, ensure_answer(T1, A0) will return Ok to the table T0, indicating that answer A0 is available. T0 now has the job of incorporating that result into its active strand. It does this in two ways. First, it creates a new strand that is looking for the next possible answer of T1. Next, it incorpoates the answer from A0 and removes the subgoal. The resulting state of table T0 is:

Table T0 [Rc<?0>: Debug]
  Strands:
    (Rc<?T>: Debug) :- selected(?T: Debug, A1)
    (Rc<u32>: Debug) :-

We then immediately activate the strand that incorporated the answer (the Rc<u32>: Debug one). In this case, that strand has no further subgoals, so it becomes an answer to the table T0. This answer can then be returned up to our caller, and the whole forest goes quiescent at this point (remember, we only do enough work to generate one answer). The ending state of the forest at this point will be:

Table T0 [Rc<?0>: Debug]
  Answer:
    A0 = [?0 = u32]
  Strands:
    (Rc<?T>: Debug) :- selected(?T: Debug, A1)

Table T1 [?0: Debug]
  Answers:
    A0 = [?0 = u32]
  Strand:
    (Vec<?U>: Debug) :- (?U: Debug)
    (Rc<?V>: Debug) :- (?V: Debug)

Here you can see how the forest captures both the answers we have created thus far and the strands that will let us try to produce more answers later on.

See also

An Overview of Chalk

Chalk is under heavy development, so if any of these links are broken or if any of the information is inconsistent with the code or outdated, please open an issue so we can fix it. If you are able to fix the issue yourself, we would love your contribution!

Chalk recasts Rust's trait system explicitly in terms of logic programming by "lowering" Rust code into a kind of logic program we can then execute queries against. (See Lowering to Logic and Lowering Rules) Its goal is to be an executable, highly readable specification of the Rust trait system.

There are many expected benefits from this work. It will consolidate our existing, somewhat ad-hoc implementation into something far more principled and expressive, which should behave better in corner cases, and be much easier to extend.

Chalk Structure

Chalk has two main "products". The first of these is the chalk_engine crate, which defines the core SLG solver. This is the part rustc uses.

The rest of chalk can be considered an elaborate testing harness. Chalk is capable of parsing Rust-like "programs", lowering them to logic, and performing queries on them.

Here's a sample session in the chalk repl, chalki. After feeding it our program, we perform some queries on it.

?- program
Enter a program; press Ctrl-D when finished
| struct Foo { }
| struct Bar { }
| struct Vec<T> { }
| trait Clone { }
| impl<T> Clone for Vec<T> where T: Clone { }
| impl Clone for Foo { }

?- Vec<Foo>: Clone
Unique; substitution [], lifetime constraints []

?- Vec<Bar>: Clone
No possible solution.

?- exists<T> { Vec<T>: Clone }
Ambiguous; no inference guidance

You can see more examples of programs and queries in the unit tests.

Next we'll go through each stage required to produce the output above.

Parsing (chalk_parse)

Chalk is designed to be incorporated with the Rust compiler, so the syntax and concepts it deals with heavily borrow from Rust. It is convenient for the sake of testing to be able to run chalk on its own, so chalk includes a parser for a Rust-like syntax. This syntax is orthogonal to the Rust AST and grammar. It is not intended to look exactly like it or support the exact same syntax.

The parser takes that syntax and produces an Abstract Syntax Tree (AST). You can find the complete definition of the AST in the source code.

The syntax contains things from Rust that we know and love, for example: traits, impls, and struct definitions. Parsing is often the first "phase" of transformation that a program goes through in order to become a format that chalk can understand.

Rust Intermediate Representation (rust_ir)

After getting the AST we convert it to a more convenient intermediate representation called rust_ir. This is sort of analogous to the HIR in Rust. The process of converting to IR is called lowering.

The rust_ir::Program struct contains some "rust things" but indexed and accessible in a different way. For example, if you have a type like Foo<Bar>, we would represent Foo as a string in the AST but in rust_ir::Program, we use numeric indices (ItemId).

The IR source code contains the complete definition.

Chalk Intermediate Representation (chalk_ir)

Once we have Rust IR it is time to convert it to "program clauses". A ProgramClause is essentially one of the following:

  • A clause of the form consequence :- conditions where :- is read as "if" and conditions = cond1 && cond2 && ...
  • A universally quantified clause of the form forall<T> { consequence :- conditions }
    • forall<T> { ... } is used to represent universal quantification. See the section on Lowering to logic for more information.
    • A key thing to note about forall is that we don't allow you to "quantify" over traits, only types and regions (lifetimes). That is, you can't make a rule like forall<Trait> { u32: Trait } which would say "u32 implements all traits". You can however say forall<T> { T: Trait } meaning "Trait is implemented by all types".
    • forall<T> { ... } is represented in the code using the Binders<T> struct.

See also: Goals and Clauses

This is where we encode the rules of the trait system into logic. For example, if we have the following Rust:

impl<T: Clone> Clone for Vec<T> {}

We generate the following program clause:

forall<T> { (Vec<T>: Clone) :- (T: Clone) }

This rule dictates that Vec<T>: Clone is only satisfied if T: Clone is also satisfied (i.e. "provable").

Similar to rust_ir::Program which has "rust-like things", chalk_ir defines ProgramEnvironment which which is "pure logic". The main field in that struct is program_clauses, which contains the ProgramClauses generated by the rules module.

Rules

The rules module (source code) defines the logic rules we use for each item in the Rust IR. It works by iterating over every trait, impl, etc. and emitting the rules that come from each one.

See also: Lowering Rules

Well-formedness checks

As part of lowering to logic, we also do some "well formedness" checks. See the rules::wf source code for where those are done.

See also: Well-formedness checking

Coherence

The function record_specialization_priorities in the coherence module (source code) checks "coherence", which means that it ensures that two impls of the same trait for the same type cannot exist.

Solver (chalk_solve)

Finally, when we've collected all the program clauses we care about, we want to perform queries on it. The component that finds the answer to these queries is called the solver.

See also: The SLG Solver

Crates

Chalk's functionality is broken up into the following crates:

  • chalk_engine: Defines the core SLG solver.
  • chalk_ir: Defines chalk's internal representation of types, lifetimes, and goals.
  • chalk_solve: Combines chalk_ir and chalk_engine, effectively.
  • chalk_parse: Defines the raw AST and a parser.
  • chalk: Brings everything together. Defines the following modules:
    • rust_ir, containing the "HIR-like" form of the AST
      • rust_ir::lowering, which converts AST to rust_ir
    • rules, which implements logic rules converting rust_ir to chalk_ir
    • coherence, which implements coherence rules
    • Also includes chalki, chalk's REPL.

Browse source code on GitHub

Testing

chalk has a test framework for lowering programs to logic, checking the lowered logic, and performing queries on it. This is how we test the implementation of chalk itself, and the viability of the lowering rules.

The main kind of tests in chalk are goal tests. They contain a program, which is expected to lower to logic successfully, and a set of queries (goals) along with the expected output. Here's an example. Since chalk's output can be quite long, goal tests support specifying only a prefix of the output.

Lowering tests check the stages that occur before we can issue queries to the solver: the lowering to rust_ir, and the well-formedness checks that occur after that.

Testing internals

Goal tests use a test! macro that takes chalk's Rust-like syntax and runs it through the full pipeline described above. The macro ultimately calls the solve_goal function.

Likewise, lowering tests use the lowering_success! and lowering_error! macros.

More Resources

Blog Posts

Bibliography

If you'd like to read more background material, here are some recommended texts and papers:

Programming with Higher-order Logic, by Dale Miller and Gopalan Nadathur, covers the key concepts of Lambda prolog. Although it's a slim little volume, it's the kind of book where you learn something new every time you open it.

"A proof procedure for the logic of Hereditary Harrop formulas", by Gopalan Nadathur. This paper covers the basics of universes, environments, and Lambda Prolog-style proof search. Quite readable.

"A new formulation of tabled resolution with delay", by Theresa Swift. This paper gives a kind of abstract treatment of the SLG formulation that is the basis for our on-demand solver.

Type checking

The rustc_typeck crate contains the source for "type collection" and "type checking", as well as a few other bits of related functionality. (It draws heavily on the type inference and trait solving.)

Type collection

Type "collection" is the process of converting the types found in the HIR (hir::Ty), which represent the syntactic things that the user wrote, into the internal representation used by the compiler (Ty<'tcx>) – we also do similar conversions for where-clauses and other bits of the function signature.

To try and get a sense for the difference, consider this function:

struct Foo { }
fn foo(x: Foo, y: self::Foo) { ... }
//        ^^^     ^^^^^^^^^

Those two parameters x and y each have the same type: but they will have distinct hir::Ty nodes. Those nodes will have different spans, and of course they encode the path somewhat differently. But once they are "collected" into Ty<'tcx> nodes, they will be represented by the exact same internal type.

Collection is defined as a bundle of queries for computing information about the various functions, traits, and other items in the crate being compiled. Note that each of these queries is concerned with interprocedural things – for example, for a function definition, collection will figure out the type and signature of the function, but it will not visit the body of the function in any way, nor examine type annotations on local variables (that's the job of type checking).

For more details, see the collect module.

TODO: actually talk about type checking...

Method lookup

Method lookup can be rather complex due to the interaction of a number of factors, such as self types, autoderef, trait lookup, etc. This file provides an overview of the process. More detailed notes are in the code itself, naturally.

One way to think of method lookup is that we convert an expression of the form:

receiver.method(...)

into a more explicit UFCS form:

Trait::method(ADJ(receiver), ...) // for a trait call
ReceiverType::method(ADJ(receiver), ...) // for an inherent method call

Here ADJ is some kind of adjustment, which is typically a series of autoderefs and then possibly an autoref (e.g., &**receiver). However we sometimes do other adjustments and coercions along the way, in particular unsizing (e.g., converting from [T; n] to [T]).

Method lookup is divided into two major phases:

  1. Probing (probe.rs). The probe phase is when we decide what method to call and how to adjust the receiver.
  2. Confirmation (confirm.rs). The confirmation phase "applies" this selection, updating the side-tables, unifying type variables, and otherwise doing side-effectful things.

One reason for this division is to be more amenable to caching. The probe phase produces a "pick" (probe::Pick), which is designed to be cacheable across method-call sites. Therefore, it does not include inference variables or other information.

The Probe phase

Steps

The first thing that the probe phase does is to create a series of steps. This is done by progressively dereferencing the receiver type until it cannot be deref'd anymore, as well as applying an optional "unsize" step. So if the receiver has type Rc<Box<[T; 3]>>, this might yield:

Rc<Box<[T; 3]>>
Box<[T; 3]>
[T; 3]
[T]

Candidate assembly

We then search along those steps to create a list of candidates. A Candidate is a method item that might plausibly be the method being invoked. For each candidate, we'll derive a "transformed self type" that takes into account explicit self.

Candidates are grouped into two kinds, inherent and extension.

Inherent candidates are those that are derived from the type of the receiver itself. So, if you have a receiver of some nominal type Foo (e.g., a struct), any methods defined within an impl like impl Foo are inherent methods. Nothing needs to be imported to use an inherent method, they are associated with the type itself (note that inherent impls can only be defined in the same module as the type itself).

FIXME: Inherent candidates are not always derived from impls. If you have a trait object, such as a value of type Box<ToString>, then the trait methods (to_string(), in this case) are inherently associated with it. Another case is type parameters, in which case the methods of their bounds are inherent. However, this part of the rules is subject to change: when DST's "impl Trait for Trait" is complete, trait object dispatch could be subsumed into trait matching, and the type parameter behavior should be reconsidered in light of where clauses.

TODO: Is this FIXME still accurate?

Extension candidates are derived from imported traits. If I have the trait ToString imported, and I call to_string() on a value of type T, then we will go off to find out whether there is an impl of ToString for T. These kinds of method calls are called "extension methods". They can be defined in any module, not only the one that defined T. Furthermore, you must import the trait to call such a method.

So, let's continue our example. Imagine that we were calling a method foo with the receiver Rc<Box<[T; 3]>> and there is a trait Foo that defines it with &self for the type Rc<U> as well as a method on the type Box that defines Foo but with &mut self. Then we might have two candidates:

&Rc<Box<[T; 3]>> from the impl of `Foo` for `Rc<U>` where `U=Box<T; 3]>
&mut Box<[T; 3]>> from the inherent impl on `Box<U>` where `U=[T; 3]`

Candidate search

Finally, to actually pick the method, we will search down the steps, trying to match the receiver type against the candidate types. At each step, we also consider an auto-ref and auto-mut-ref to see whether that makes any of the candidates match. We pick the first step where we find a match.

In the case of our example, the first step is Rc<Box<[T; 3]>>, which does not itself match any candidate. But when we autoref it, we get the type &Rc<Box<[T; 3]>> which does match. We would then recursively consider all where-clauses that appear on the impl: if those match (or we cannot rule out that they do), then this is the method we would pick. Otherwise, we would continue down the series of steps.

Variance of type and lifetime parameters

For a more general background on variance, see the background appendix.

During type checking we must infer the variance of type and lifetime parameters. The algorithm is taken from Section 4 of the paper "Taming the Wildcards: Combining Definition- and Use-Site Variance" published in PLDI'11 and written by Altidor et al., and hereafter referred to as The Paper.

This inference is explicitly designed not to consider the uses of types within code. To determine the variance of type parameters defined on type X, we only consider the definition of the type X and the definitions of any types it references.

We only infer variance for type parameters found on data types like structs and enums. In these cases, there is a fairly straightforward explanation for what variance means. The variance of the type or lifetime parameters defines whether T<A> is a subtype of T<B> (resp. T<'a> and T<'b>) based on the relationship of A and B (resp. 'a and 'b).

We do not infer variance for type parameters found on traits, functions, or impls. Variance on trait parameters can indeed make sense (and we used to compute it) but it is actually rather subtle in meaning and not that useful in practice, so we removed it. See the addendum for some details. Variances on function/impl parameters, on the other hand, doesn't make sense because these parameters are instantiated and then forgotten, they don't persist in types or compiled byproducts.

Notation

We use the notation of The Paper throughout this chapter:

  • + is covariance.
  • - is contravariance.
  • * is bivariance.
  • o is invariance.

The algorithm

The basic idea is quite straightforward. We iterate over the types defined and, for each use of a type parameter X, accumulate a constraint indicating that the variance of X must be valid for the variance of that use site. We then iteratively refine the variance of X until all constraints are met. There is always a solution, because at the limit we can declare all type parameters to be invariant and all constraints will be satisfied.

As a simple example, consider:

enum Option<A> { Some(A), None }
enum OptionalFn<B> { Some(|B|), None }
enum OptionalMap<C> { Some(|C| -> C), None }

Here, we will generate the constraints:

1. V(A) <= +
2. V(B) <= -
3. V(C) <= +
4. V(C) <= -

These indicate that (1) the variance of A must be at most covariant; (2) the variance of B must be at most contravariant; and (3, 4) the variance of C must be at most covariant and contravariant. All of these results are based on a variance lattice defined as follows:

   *      Top (bivariant)
-     +
   o      Bottom (invariant)

Based on this lattice, the solution V(A)=+, V(B)=-, V(C)=o is the optimal solution. Note that there is always a naive solution which just declares all variables to be invariant.

You may be wondering why fixed-point iteration is required. The reason is that the variance of a use site may itself be a function of the variance of other type parameters. In full generality, our constraints take the form:

V(X) <= Term
Term := + | - | * | o | V(X) | Term x Term

Here the notation V(X) indicates the variance of a type/region parameter X with respect to its defining class. Term x Term represents the "variance transform" as defined in the paper:

If the variance of a type variable X in type expression E is V2 and the definition-site variance of the [corresponding] type parameter of a class C is V1, then the variance of X in the type expression C<E> is V3 = V1.xform(V2).

Constraints

If I have a struct or enum with where clauses:

struct Foo<T: Bar> { ... }

you might wonder whether the variance of T with respect to Bar affects the variance T with respect to Foo. I claim no. The reason: assume that T is invariant with respect to Bar but covariant with respect to Foo. And then we have a Foo<X> that is upcast to Foo<Y>, where X <: Y. However, while X : Bar, Y : Bar does not hold. In that case, the upcast will be illegal, but not because of a variance failure, but rather because the target type Foo<Y> is itself just not well-formed. Basically we get to assume well-formedness of all types involved before considering variance.

Dependency graph management

Because variance is a whole-crate inference, its dependency graph can become quite muddled if we are not careful. To resolve this, we refactor into two queries:

  • crate_variances computes the variance for all items in the current crate.
  • variances_of accesses the variance for an individual reading; it works by requesting crate_variances and extracting the relevant data.

If you limit yourself to reading variances_of, your code will only depend then on the inference of that particular item.

Ultimately, this setup relies on the red-green algorithm. In particular, every variance query effectively depends on all type definitions in the entire crate (through crate_variances), but since most changes will not result in a change to the actual results from variance inference, the variances_of query will wind up being considered green after it is re-evaluated.

Addendum: Variance on traits

As mentioned above, we used to permit variance on traits. This was computed based on the appearance of trait type parameters in method signatures and was used to represent the compatibility of vtables in trait objects (and also "virtual" vtables or dictionary in trait bounds). One complication was that variance for associated types is less obvious, since they can be projected out and put to myriad uses, so it's not clear when it is safe to allow X<A>::Bar to vary (or indeed just what that means). Moreover (as covered below) all inputs on any trait with an associated type had to be invariant, limiting the applicability. Finally, the annotations (MarkerTrait, PhantomFn) needed to ensure that all trait type parameters had a variance were confusing and annoying for little benefit.

Just for historical reference, I am going to preserve some text indicating how one could interpret variance and trait matching.

Variance and object types

Just as with structs and enums, we can decide the subtyping relationship between two object types &Trait<A> and &Trait<B> based on the relationship of A and B. Note that for object types we ignore the Self type parameter – it is unknown, and the nature of dynamic dispatch ensures that we will always call a function that is expected the appropriate Self type. However, we must be careful with the other type parameters, or else we could end up calling a function that is expecting one type but provided another.

To see what I mean, consider a trait like so:


# #![allow(unused_variables)]
#fn main() {
trait ConvertTo<A> {
    fn convertTo(&self) -> A;
}
#}

Intuitively, If we had one object O=&ConvertTo<Object> and another S=&ConvertTo<String>, then S <: O because String <: Object (presuming Java-like "string" and "object" types, my go to examples for subtyping). The actual algorithm would be to compare the (explicit) type parameters pairwise respecting their variance: here, the type parameter A is covariant (it appears only in a return position), and hence we require that String <: Object.

You'll note though that we did not consider the binding for the (implicit) Self type parameter: in fact, it is unknown, so that's good. The reason we can ignore that parameter is precisely because we don't need to know its value until a call occurs, and at that time (as you said) the dynamic nature of virtual dispatch means the code we run will be correct for whatever value Self happens to be bound to for the particular object whose method we called. Self is thus different from A, because the caller requires that A be known in order to know the return type of the method convertTo(). (As an aside, we have rules preventing methods where Self appears outside of the receiver position from being called via an object.)

Trait variance and vtable resolution

But traits aren't only used with objects. They're also used when deciding whether a given impl satisfies a given trait bound. To set the scene here, imagine I had a function:

fn convertAll<A,T:ConvertTo<A>>(v: &[T]) { ... }

Now imagine that I have an implementation of ConvertTo for Object:

impl ConvertTo<i32> for Object { ... }

And I want to call convertAll on an array of strings. Suppose further that for whatever reason I specifically supply the value of String for the type parameter T:

let mut vector = vec!["string", ...];
convertAll::<i32, String>(vector);

Is this legal? To put another way, can we apply the impl for Object to the type String? The answer is yes, but to see why we have to expand out what will happen:

  • convertAll will create a pointer to one of the entries in the vector, which will have type &String

  • It will then call the impl of convertTo() that is intended for use with objects. This has the type fn(self: &Object) -> i32.

    It is OK to provide a value for self of type &String because &String <: &Object.

OK, so intuitively we want this to be legal, so let's bring this back to variance and see whether we are computing the correct result. We must first figure out how to phrase the question "is an impl for Object,i32 usable where an impl for String,i32 is expected?"

Maybe it's helpful to think of a dictionary-passing implementation of type classes. In that case, convertAll() takes an implicit parameter representing the impl. In short, we have an impl of type:

V_O = ConvertTo<i32> for Object

and the function prototype expects an impl of type:

V_S = ConvertTo<i32> for String

As with any argument, this is legal if the type of the value given (V_O) is a subtype of the type expected (V_S). So is V_O <: V_S? The answer will depend on the variance of the various parameters. In this case, because the Self parameter is contravariant and A is covariant, it means that:

V_O <: V_S iff
    i32 <: i32
    String <: Object

These conditions are satisfied and so we are happy.

Variance and associated types

Traits with associated types – or at minimum projection expressions – must be invariant with respect to all of their inputs. To see why this makes sense, consider what subtyping for a trait reference means:

<T as Trait> <: <U as Trait>

means that if I know that T as Trait, I also know that U as Trait. Moreover, if you think of it as dictionary passing style, it means that a dictionary for <T as Trait> is safe to use where a dictionary for <U as Trait> is expected.

The problem is that when you can project types out from <T as Trait>, the relationship to types projected out of <U as Trait> is completely unknown unless T==U (see #21726 for more details). Making Trait invariant ensures that this is true.

Another related reason is that if we didn't make traits with associated types invariant, then projection is no longer a function with a single result. Consider:

trait Identity { type Out; fn foo(&self); }
impl<T> Identity for T { type Out = T; ... }

Now if I have <&'static () as Identity>::Out, this can be validly derived as &'a () for any 'a:

<&'a () as Identity> <: <&'static () as Identity>
if &'static () < : &'a ()   -- Identity is contravariant in Self
if 'static : 'a             -- Subtyping rules for relations

This change otoh means that <'static () as Identity>::Out is always &'static () (which might then be upcast to 'a (), separately). This was helpful in solving #21750.

Existential Types

Existential types are essentially strong type aliases which only expose a specific set of traits as their interface and the concrete type in the background is inferred from a certain set of use sites of the existential type.

In the language they are expressed via

existential type Foo: Bar;

This is in existential type named Foo which can be interacted with via the Bar trait's interface.

Since there needs to be a concrete background type, you can currently express that type by using the existential type in a "defining use site".

struct Struct;
impl Bar for Struct { /* stuff */ }
fn foo() -> Foo {
    Struct
}

Any other "defining use site" needs to produce the exact same type.

Defining use site(s)

Currently only the return value of a function inside can be a defining use site of an existential type (and only if the return type of that function contains the existential type).

The defining use of an existential type can be any code within the parent of the existential type definition. This includes any siblings of the existential type and all children of the siblings.

The initiative for "not causing fatal brain damage to developers due to accidentally running infinite loops in their brain while trying to comprehend what the type system is doing" has decided to disallow children of existential types to be defining use sites.

Associated existential types

Associated existential types can be defined by any other associated item on the same trait impl or a child of these associated items.

The MIR (Mid-level IR)

MIR is Rust's Mid-level Intermediate Representation. It is constructed from HIR. MIR was introduced in RFC 1211. It is a radically simplified form of Rust that is used for certain flow-sensitive safety checks – notably the borrow checker! – and also for optimization and code generation.

If you'd like a very high-level introduction to MIR, as well as some of the compiler concepts that it relies on (such as control-flow graphs and desugaring), you may enjoy the rust-lang blog post that introduced MIR.

Introduction to MIR

MIR is defined in the src/librustc/mir/ module, but much of the code that manipulates it is found in src/librustc_mir.

Some of the key characteristics of MIR are:

  • It is based on a control-flow graph.
  • It does not have nested expressions.
  • All types in MIR are fully explicit.

Key MIR vocabulary

This section introduces the key concepts of MIR, summarized here:

  • Basic blocks: units of the control-flow graph, consisting of:
    • statements: actions with one successor
    • terminators: actions with potentially multiple successors; always at the end of a block
    • (if you're not familiar with the term basic block, see the background chapter)
  • Locals: Memory locations allocated on the stack (conceptually, at least), such as function arguments, local variables, and temporaries. These are identified by an index, written with a leading underscore, like _1. There is also a special "local" (_0) allocated to store the return value.
  • Places: expressions that identify a location in memory, like _1 or _1.f.
  • Rvalues: expressions that produce a value. The "R" stands for the fact that these are the "right-hand side" of an assignment.
    • Operands: the arguments to an rvalue, which can either be a constant (like 22) or a place (like _1).

You can get a feeling for how MIR is structed by translating simple programs into MIR and reading the pretty printed output. In fact, the playground makes this easy, since it supplies a MIR button that will show you the MIR for your program. Try putting this program into play (or clicking on this link), and then clicking the "MIR" button on the top:

fn main() {
    let mut vec = Vec::new();
    vec.push(1);
    vec.push(2);
}

You should see something like:

// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn main() -> () {
    ...
}

This is the MIR format for the main function.

Variable declarations. If we drill in a bit, we'll see it begins with a bunch of variable declarations. They look like this:

let mut _0: ();                      // return place
scope 1 {
    let mut _1: std::vec::Vec<i32>;  // "vec" in scope 1 at src/main.rs:2:9: 2:16
}
scope 2 {
}
let mut _2: ();
let mut _3: &mut std::vec::Vec<i32>;
let mut _4: ();
let mut _5: &mut std::vec::Vec<i32>;

You can see that variables in MIR don't have names, they have indices, like _0 or _1. We also intermingle the user's variables (e.g., _1) with temporary values (e.g., _2 or _3). You can tell the difference between user-defined variables have a comment that gives you their original name (// "vec" in scope 1...). The "scope" blocks (e.g., scope 1 { .. }) describe the lexical structure of the source program (which names were in scope when).

Basic blocks. Reading further, we see our first basic block (naturally it may look slightly different when you view it, and I am ignoring some of the comments):

bb0: {
    StorageLive(_1);
    _1 = const <std::vec::Vec<T>>::new() -> bb2;
}

A basic block is defined by a series of statements and a final terminator. In this case, there is one statement:

StorageLive(_1);

This statement indicates that the variable _1 is "live", meaning that it may be used later – this will persist until we encounter a StorageDead(_1) statement, which indicates that the variable _1 is done being used. These "storage statements" are used by LLVM to allocate stack space.

The terminator of the block bb0 is the call to Vec::new:

_1 = const <std::vec::Vec<T>>::new() -> bb2;

Terminators are different from statements because they can have more than one successor – that is, control may flow to different places. Function calls like the call to Vec::new are always terminators because of the possibility of unwinding, although in the case of Vec::new we are able to see that indeed unwinding is not possible, and hence we list only one succssor block, bb2.

If we look ahead to bb2, we will see it looks like this:

bb2: {
    StorageLive(_3);
    _3 = &mut _1;
    _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
}

Here there are two statements: another StorageLive, introducing the _3 temporary, and then an assignment:

_3 = &mut _1;

Assignments in general have the form:

<Place> = <Rvalue>

A place is an expression like _3, _3.f or *_3 – it denotes a location in memory. An Rvalue is an expression that creates a value: in this case, the rvalue is a mutable borrow expression, which looks like &mut <Place>. So we can kind of define a grammar for rvalues like so:

<Rvalue>  = & (mut)? <Place>
          | <Operand> + <Operand>
          | <Operand> - <Operand>
          | ...

<Operand> = Constant
          | copy Place
          | move Place

As you can see from this grammar, rvalues cannot be nested – they can only reference places and constants. Moreover, when you use a place, we indicate whether we are copying it (which requires that the place have a type T where T: Copy) or moving it (which works for a place of any type). So, for example, if we had the expression x = a + b + c in Rust, that would get compile to two statements and a temporary:

TMP1 = a + b
x = TMP1 + c

(Try it and see, though you may want to do release mode to skip over the overflow checks.)

MIR data types

The MIR data types are defined in the src/librustc/mir/ module. Each of the key concepts mentioned in the previous section maps in a fairly straightforward way to a Rust type.

The main MIR data type is Mir. It contains the data for a single function (along with sub-instances of Mir for "promoted constants", but you can read about those below).

  • Basic blocks: The basic blocks are stored in the field basic_blocks; this is a vector of BasicBlockData structures. Nobody ever references a basic block directly: instead, we pass around BasicBlock values, which are newtype'd indices into this vector.
  • Statements are represented by the type Statement.
  • Terminators are represented by the Terminator.
  • Locals are represented by a newtype'd index type Local. The data for a local variable is found in the Mir (the local_decls vector). There is also a special constant RETURN_PLACE identifying the special "local" representing the return value.
  • Places are identified by the enum Place. There are a few variants:
    • Local variables like _1
    • Static variables FOO
    • Projections, which are fields or other things that "project out" from a base place. So e.g. the place _1.f is a projection, with f being the "projection element and _1 being the base path. *_1 is also a projection, with the * being represented by the ProjectionElem::Deref element.
  • Rvalues are represented by the enum Rvalue.
  • Operands are represented by the enum Operand.

Representing constants

to be written

to be written

MIR construction

The lowering of HIR to MIR occurs for the following (probably incomplete) list of items:

  • Function and Closure bodies
  • Initializers of static and const items
  • Initializers of enum discriminants
  • Glue and Shims of any kind
    • Tuple struct initializer functions
    • Drop code (the Drop::drop function is not called directly)
    • Drop implementations of types without an explicit Drop implementation

The lowering is triggered by calling the mir_built query. There is an intermediate representation between HIR and MIR called the HAIR that is only used during the lowering. The HAIR's most important feature is that the various adjustments (which happen without explicit syntax) like coercions, autoderef, autoref and overloaded method calls have become explicit casts, deref operations, reference expressions or concrete function calls.

The HAIR has datatypes that mirror the HIR datatypes, but instead of e.g. -x being a hair::ExprKind::Neg(hair::Expr) it is a hair::ExprKind::Neg(hir::Expr). This shallowness enables the HAIR to represent all datatypes that HIR has, but without having to create an in-memory copy of the entire HIR. MIR lowering will first convert the topmost expression from HIR to HAIR (in rustc_mir::hair::cx::expr) and then process the HAIR expressions recursively.

The lowering creates local variables for every argument as specified in the signature. Next it creates local variables for every binding specified (e.g. (a, b): (i32, String)) produces 3 bindings, one for the argument, and two for the bindings. Next it generates field accesses that read the fields from the argument and writes the value to the binding variable.

With this initialization out of the way, the lowering triggers a recursive call to a function that generates the MIR for the body (a Block expression) and writes the result into the RETURN_PLACE.

unpack! all the things

Functions that generate MIR tend to fall into one of two patterns. First, if the function generates only statements, then it will take a basic block as argument onto which those statements should be appended. It can then return a result as normal:

fn generate_some_mir(&mut self, block: BasicBlock) -> ResultType {
   ...
}

But there are other functions that may generate new basic blocks as well. For example, lowering an expression like if foo { 22 } else { 44 } requires generating a small "diamond-shaped graph". In this case, the functions take a basic block where their code starts and return a (potentially) new basic block where the code generation ends. The BlockAnd type is used to represent this:

fn generate_more_mir(&mut self, block: BasicBlock) -> BlockAnd<ResultType> {
    ...
}

When you invoke these functions, it is common to have a local variable block that is effectively a "cursor". It represents the point at which we are adding new MIR. When you invoke generate_more_mir, you want to update this cursor. You can do this manually, but it's tedious:

let mut block;
let v = match self.generate_more_mir(..) {
    BlockAnd { block: new_block, value: v } => {
        block = new_block;
        v
    }
};

For this reason, we offer a macro that lets you write let v = unpack!(block = self.generate_more_mir(...)). It simply extracts the new block and overwrites the variable block that you named in the unpack!.

Lowering expressions into the desired MIR

There are essentially four kinds of representations one might want of an expression:

  • Place refers to a (or part of a) preexisting memory location (local, static, promoted)
  • Rvalue is something that can be assigned to a Place
  • Operand is an argument to e.g. a + operation or a function call
  • a temporary variable containing a copy of the value

These following image depicts a general overview of the interactions between the representations:

Click here for a more detailed view

We start out with lowering the function body to an Rvalue so we can create an assignment to RETURN_PLACE, This Rvalue lowering will in turn trigger lowering to Operand for its arguments (if any). Operand lowering either produces a const operand, or moves/copies out of a Place, thus triggering a Place lowering. An expression being lowered to a Place can in turn trigger a temporary to be created if the expression being lowered contains operations. This is where the snake bites its own tail and we need to trigger an Rvalue lowering for the expression to be written into the local.

Operator lowering

Operators on builtin types are not lowered to function calls (which would end up being infinite recursion calls, because the trait impls just contain the operation itself again). Instead there are Rvalues for binary and unary operators and index operations. These Rvalues later get codegened to llvm primitive operations or llvm intrinsics.

Operators on all other types get lowered to a function call to their impl of the operator's corresponding trait.

Regardless of the lowering kind, the arguments to the operator are lowered to Operands. This means all arguments are either constants, or refer to an already existing value somewhere in a local or static.

Method call lowering

Method calls are lowered to the same TerminatorKind that function calls are. In MIR there is no difference between method calls and function calls anymore.

Conditions

if conditions and match statements for enums without variants with fields are lowered to TerminatorKind::SwitchInt. Each possible value (so 0 and 1 for if conditions) has a corresponding BasicBlock to which the code continues. The argument being branched on is (again) an Operand representing the value of the if condition.

Pattern matching

match statements for enums with variants that have fields are lowered to TerminatorKind::SwitchInt, too, but the Operand refers to a Place where the discriminant of the value can be found. This often involves reading the discriminant to a new temporary variable.

Aggregate construction

Aggregate values of any kind (e.g. structs or tuples) are built via Rvalue::Aggregate. All fields are lowered to Operators. This is essentially equivalent to one assignment statement per aggregate field plus an assignment to the discriminant in the case of enums.

MIR visitor

The MIR visitor is a convenient tool for traversing the MIR and either looking for things or making changes to it. The visitor traits are defined in the rustc::mir::visit module – there are two of them, generated via a single macro: Visitor (which operates on a &Mir and gives back shared references) and MutVisitor (which operates on a &mut Mir and gives back mutable references).

To implement a visitor, you have to create a type that represents your visitor. Typically, this type wants to "hang on" to whatever state you will need while processing MIR:

struct MyVisitor<...> {
    tcx: TyCtxt<'cx, 'tcx, 'tcx>,
    ...
}

and you then implement the Visitor or MutVisitor trait for that type:

impl<'tcx> MutVisitor<'tcx> for NoLandingPads {
    fn visit_foo(&mut self, ...) {
        ...
        self.super_foo(...);
    }
}

As shown above, within the impl, you can override any of the visit_foo methods (e.g., visit_terminator) in order to write some code that will execute whenever a foo is found. If you want to recursively walk the contents of the foo, you then invoke the super_foo method. (NB. You never want to override super_foo.)

A very simple example of a visitor can be found in NoLandingPads. That visitor doesn't even require any state: it just visits all terminators and removes their unwind successors.

Traversal

In addition the visitor, the rustc::mir::traversal module contains useful functions for walking the MIR CFG in different standard orders (e.g. pre-order, reverse post-order, and so forth).

MIR passes

If you would like to get the MIR for a function (or constant, etc), you can use the optimized_mir(def_id) query. This will give you back the final, optimized MIR. For foreign def-ids, we simply read the MIR from the other crate's metadata. But for local def-ids, the query will construct the MIR and then iteratively optimize it by applying a series of passes. This section describes how those passes work and how you can extend them.

To produce the optimized_mir(D) for a given def-id D, the MIR passes through several suites of optimizations, each represented by a query. Each suite consists of multiple optimizations and transformations. These suites represent useful intermediate points where we want to access the MIR for type checking or other purposes:

  • mir_build(D) – not a query, but this constructs the initial MIR
  • mir_const(D) – applies some simple transformations to make MIR ready for constant evaluation;
  • mir_validated(D) – applies some more transformations, making MIR ready for borrow checking;
  • optimized_mir(D) – the final state, after all optimizations have been performed.

Seeing how the MIR changes as the compiler executes

-Zdump-mir=F is a handy compiler options that will let you view the MIR for each function at each stage of compilation. -Zdump-mir takes a filter F which allows you to control which functions and which passes you are interesting in. For example:

> rustc -Zdump-mir=foo ...

This will dump the MIR for any function whose name contains foo; it will dump the MIR both before and after every pass. Those files will be created in the mir_dump directory. There will likely be quite a lot of them!

> cat > foo.rs
fn main() {
    println!("Hello, world!");
}
^D
> rustc -Zdump-mir=main foo.rs
> ls mir_dump/* | wc -l
     161

The files have names like rustc.main.000-000.CleanEndRegions.after.mir. These names have a number of parts:

rustc.main.000-000.CleanEndRegions.after.mir
      ---- --- --- --------------- ----- either before or after
      |    |   |   name of the pass
      |    |   index of dump within the pass (usually 0, but some passes dump intermediate states)
      |    index of the pass
      def-path to the function etc being dumped

You can also make more selective filters. For example, main & CleanEndRegions will select for things that reference both main and the pass CleanEndRegions:

> rustc -Zdump-mir='main & CleanEndRegions' foo.rs
> ls mir_dump
rustc.main.000-000.CleanEndRegions.after.mir	rustc.main.000-000.CleanEndRegions.before.mir

Filters can also have | parts to combine multiple sets of &-filters. For example main & CleanEndRegions | main & NoLandingPads will select either main and CleanEndRegions or main and NoLandingPads:

> rustc -Zdump-mir='main & CleanEndRegions | main & NoLandingPads' foo.rs
> ls mir_dump
rustc.main-promoted[0].002-000.NoLandingPads.after.mir
rustc.main-promoted[0].002-000.NoLandingPads.before.mir
rustc.main-promoted[0].002-006.NoLandingPads.after.mir
rustc.main-promoted[0].002-006.NoLandingPads.before.mir
rustc.main-promoted[1].002-000.NoLandingPads.after.mir
rustc.main-promoted[1].002-000.NoLandingPads.before.mir
rustc.main-promoted[1].002-006.NoLandingPads.after.mir
rustc.main-promoted[1].002-006.NoLandingPads.before.mir
rustc.main.000-000.CleanEndRegions.after.mir
rustc.main.000-000.CleanEndRegions.before.mir
rustc.main.002-000.NoLandingPads.after.mir
rustc.main.002-000.NoLandingPads.before.mir
rustc.main.002-006.NoLandingPads.after.mir
rustc.main.002-006.NoLandingPads.before.mir

(Here, the main-promoted[0] files refer to the MIR for "promoted constants" that appeared within the main function.)

Implementing and registering a pass

A MirPass is some bit of code that processes the MIR, typically – but not always – transforming it along the way somehow. For example, it might perform an optimization. The MirPass trait itself is found in in the rustc_mir::transform module, and it basically consists of one method, run_pass, that simply gets an &mut Mir (along with the tcx and some information about where it came from). The MIR is therefore modified in place (which helps to keep things efficient).

A good example of a basic MIR pass is NoLandingPads, which walks the MIR and removes all edges that are due to unwinding – this is used when configured with panic=abort, which never unwinds. As you can see from its source, a MIR pass is defined by first defining a dummy type, a struct with no fields, something like:


# #![allow(unused_variables)]
#fn main() {
struct MyPass;
#}

for which you then implement the MirPass trait. You can then insert this pass into the appropriate list of passes found in a query like optimized_mir, mir_validated, etc. (If this is an optimization, it should go into the optimized_mir list.)

If you are writing a pass, there's a good chance that you are going to want to use a MIR visitor. MIR visitors are a handy way to walk all the parts of the MIR, either to search for something or to make small edits.

Stealing

The intermediate queries mir_const() and mir_validated() yield up a &'tcx Steal<Mir<'tcx>>, allocated using tcx.alloc_steal_mir(). This indicates that the result may be stolen by the next suite of optimizations – this is an optimization to avoid cloning the MIR. Attempting to use a stolen result will cause a panic in the compiler. Therefore, it is important that you do not read directly from these intermediate queries except as part of the MIR processing pipeline.

Because of this stealing mechanism, some care must also be taken to ensure that, before the MIR at a particular phase in the processing pipeline is stolen, anyone who may want to read from it has already done so. Concretely, this means that if you have some query foo(D) that wants to access the result of mir_const(D) or mir_validated(D), you need to have the successor pass "force" foo(D) using ty::queries::foo::force(...). This will force a query to execute even though you don't directly require its result.

As an example, consider MIR const qualification. It wants to read the result produced by the mir_const() suite. However, that result will be stolen by the mir_validated() suite. If nothing was done, then mir_const_qualif(D) would succeed if it came before mir_validated(D), but fail otherwise. Therefore, mir_validated(D) will force mir_const_qualif before it actually steals, thus ensuring that the reads have already happened (remember that queries are memoized, so executing a query twice simply loads from a cache the second time):

mir_const(D) --read-by--> mir_const_qualif(D)
     |                       ^
  stolen-by                  |
     |                    (forces)
     v                       |
mir_validated(D) ------------+

This mechanism is a bit dodgy. There is a discussion of more elegant alternatives in rust-lang/rust#41710.

MIR optimizations

MIR borrow check

The borrow check is Rust's "secret sauce" – it is tasked with enforcing a number of properties:

  • That all variables are initialized before they are used.
  • That you can't move the same value twice.
  • That you can't move a value while it is borrowed.
  • That you can't access a place while it is mutably borrowed (except through the reference).
  • That you can't mutate a place while it is shared borrowed.
  • etc

At the time of this writing, the code is in a state of transition. The "main" borrow checker still works by processing the HIR, but that is being phased out in favor of the MIR-based borrow checker. Accordingly, this documentation focuses on the new, MIR-based borrow checker.

Doing borrow checking on MIR has several advantages:

Major phases of the borrow checker

The borrow checker source is found in the rustc_mir::borrow_check module. The main entry point is the mir_borrowck query.

  • We first create a local copy of the MIR. In the coming steps, we will modify this copy in place to modify the types and things to include references to the new regions that we are computing.
  • We then invoke replace_regions_in_mir to modify our local MIR. Among other things, this function will replace all of the regions in the MIR with fresh inference variables.
  • Next, we perform a number of dataflow analyses that compute what data is moved and when.
  • We then do a second type check across the MIR: the purpose of this type check is to determine all of the constraints between different regions.
  • Next, we do region inference, which computes the values of each region — basically, points in the control-flow graph.
  • At this point, we can compute the "borrows in scope" at each point.
  • Finally, we do a second walk over the MIR, looking at the actions it does and reporting errors. For example, if we see a statement like *a + 1, then we would check that the variable a is initialized and that it is not mutably borrowed, as either of those would require an error to be reported.
    • Doing this check requires the results of all the previous analyses.

Tracking moves and initialization

Part of the borrow checker's job is to track which variables are "initialized" at any given point in time -- this also requires figuring out where moves occur and tracking those.

Initialization and moves

From a user's perspective, initialization -- giving a variable some value -- and moves -- transfering ownership to another place -- might seem like distinct topics. Indeed, our borrow checker error messages often talk about them differently. But within the borrow checker, they are not nearly as separate. Roughly speaking, the borrow checker tracks the set of "initialized places" at any point in the source code. Assigning to a previously uninitialized local variable adds it to that set; moving from a local variable removes it from that set.

Consider this example:

fn foo() {
    let a: Vec<u32>;
    
    // a is not initialized yet
    
    a = vec![22];
    
    // a is initialized here
    
    std::mem::drop(a); // a is moved here
    
    // a is no longer initialized here

    let l = a.len(); //~ ERROR
}

Here you can see that a starts off as uninitialized; once it is assigned, it becomes initialized. But when drop(a) is called, that moves a into the call, and hence it becomes uninitialized again.

Subsections

To make it easier to peruse, this section is broken into a number of subsections:

  • Move paths the move path concept that we use to track which local variables (or parts of local variables, in some cases) are initialized.
  • TODO Rest not yet written =)

Move paths

In reality, it's not enough to track initialization at the granularity of local variables. Rust also allows us to do moves and initialization at the field granularity:

fn foo() {
    let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
    
    // a.0 and a.1 are both initialized
    
    let b = a.0; // moves a.0
    
    // a.0 is not initializd, but a.1 still is

    let c = a.0; // ERROR
    let d = a.1; // OK
}

To handle this, we track initialization at the granularity of a move path. A MovePath represents some location that the user can initialize, move, etc. So e.g. there is a move-path representing the local variable a, and there is a move-path representing a.0. Move paths roughly correspond to the concept of a Place from MIR, but they are indexed in ways that enable us to do move analysis more efficiently.

Move path indices

Although there is a MovePath data structure, they are never referenced directly. Instead, all the code passes around indices of type MovePathIndex. If you need to get information about a move path, you use this index with the move_paths field of the MoveData. For example, to convert a MovePathIndex mpi into a MIR Place, you might access the MovePath::place field like so:

move_data.move_paths[mpi].place

Building move paths

One of the first things we do in the MIR borrow check is to construct the set of move paths. This is done as part of the MoveData::gather_moves function. This function uses a MIR visitor called Gatherer to walk the MIR and look at how each Place within is accessed. For each such Place, it constructs a corresponding MovePathIndex. It also records when/where that particular move path is moved/initialized, but we'll get to that in a later section.

Illegal move paths

We don't actually create a move-path for every Place that gets used. In particular, if it is illegal to move from a Place, then there is no need for a MovePathIndex. Some examples:

  • You cannot move from a static variable, so we do not create a MovePathIndex for static variables.
  • You cannot move an individual element of an array, so if we have e.g. foo: [String; 3], there would be no move-path for foo[1].
  • You cannot move from inside of a borrowed reference, so if we have e.g. foo: &String, there would be no move-path for *foo.

These rules are enforced by the move_path_for function, which converts a Place into a MovePathIndex -- in error cases like those just discussed, the function returns an Err. This in turn means we don't have to bother tracking whether those places are initialized (which lowers overhead).

Looking up a move-path

If you have a Place and you would like to convert it to a MovePathIndex, you can do that using the MovePathLookup structure found in the rev_lookup field of [MoveData]. There are two different methods:

  • find_local, which takes a mir::Local representing a local variable. This is the easier method, because we always create a MovePathIndex for every local variable.
  • find, which takes an arbitrary Place. This method is a bit more annoying to use, precisely because we don't have a MovePathIndex for every Place (as we just discussed in the "illegal move paths" section). Therefore, find returns a LookupResult indicating the closest path it was able to find that exists (e.g., for foo[1], it might return just the path for foo).

Cross-references

As we noted above, move-paths are stored in a big vector and referenced via their MovePathIndex. However, within this vector, they are also structured into a tree. So for example if you have the MovePathIndex for a.b.c, you can go to its parent move-path a.b. You can also iterate over all children paths: so, from a.b, you might iterate to find the path a.b.c (here you are iterating just over the paths that are actually referenced in the source, not all possible paths that could have been referenced). These references are used for example in the has_any_child_of function, which checks whether the dataflow results contain a value for the given move-path (e.g., a.b) or any child of that move-path (e.g., a.b.c).

The MIR type-check

A key component of the borrow check is the MIR type-check. This check walks the MIR and does a complete "type check" -- the same kind you might find in any other language. In the process of doing this type-check, we also uncover the region constraints that apply to the program.

TODO -- elaborate further? Maybe? :)

Region inference (NLL)

The MIR-based region checking code is located in the rustc_mir::borrow_check::nll module. (NLL, of course, stands for "non-lexical lifetimes", a term that will hopefully be deprecated once they become the standard kind of lifetime.)

The MIR-based region analysis consists of two major functions:

  • replace_regions_in_mir, invoked first, has two jobs:
    • First, it finds the set of regions that appear within the signature of the function (e.g., 'a in fn foo<'a>(&'a u32) { ... }). These are called the "universal" or "free" regions – in particular, they are the regions that appear free in the function body.
    • Second, it replaces all the regions from the function body with fresh inference variables. This is because (presently) those regions are the results of lexical region inference and hence are not of much interest. The intention is that – eventually – they will be "erased regions" (i.e., no information at all), since we won't be doing lexical region inference at all.
  • compute_regions, invoked second: this is given as argument the results of move analysis. It has the job of computing values for all the inference variables that replace_regions_in_mir introduced.
    • To do that, it first runs the MIR type checker. This is basically a normal type-checker but specialized to MIR, which is much simpler than full Rust of course. Running the MIR type checker will however create outlives constraints between region variables (e.g., that one variable must outlive another one) to reflect the subtyping relationships that arise.
    • It also adds liveness constraints that arise from where variables are used.
    • More details to come, though the NLL RFC also includes fairly thorough (and hopefully readable) coverage.

Universal regions

to be written – explain the UniversalRegions type

Region variables and constraints

to be written – describe the RegionInferenceContext and the role of liveness_constraints vs other constraints, plus

Closures

to be written

The MIR type-check

Representing the "values" of a region variable

The value of a region can be thought of as a set; we call the domain of this set a RegionElement. In the code, the value for all regions is maintained in the rustc_mir::borrow_check::nll::region_infer module. For each region we maintain a set storing what elements are present in its value (to make this efficient, we give each kind of element an index, the RegionElementIndex, and use sparse bitsets).

The kinds of region elements are as follows:

  • Each location in the MIR control-flow graph: a location is just the pair of a basic block and an index. This identifies the point on entry to the statement with that index (or the terminator, if the index is equal to statements.len()).
  • There is an element end('a) for each universal region 'a, corresponding to some portion of the caller's (or caller's caller, etc) control-flow graph.
  • Similarly, there is an element denoted end('static) corresponding to the remainder of program execution after this function returns.
  • There is an element !1 for each placeholder region !1. This corresponds (intuitively) to some unknown set of other elements – for details on placeholders, see the section placeholders and universes.

Causal tracking

to be written – describe how we can extend the values of a variable with causal tracking etc

Placeholders and universes

(This section describes ongoing work that hasn't landed yet.)

From time to time we have to reason about regions that we can't concretely know. For example, consider this program:

// A function that needs a static reference
fn foo(x: &'static u32) { }

fn bar(f: for<'a> fn(&'a u32)) {
       // ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference
    let x = 22;
    f(&x);
}

fn main() {
    bar(foo);
}

This program ought not to type-check: foo needs a static reference for its argument, and bar wants to be given a function that that accepts any reference (so it can call it with something on its stack, for example). But how do we reject it and why?

Subtyping and Placeholders

When we type-check main, and in particular the call bar(foo), we are going to wind up with a subtyping relationship like this one:

fn(&'static u32) <: for<'a> fn(&'a u32)
----------------    -------------------
the type of `foo`   the type `bar` expects

We handle this sort of subtyping by taking the variables that are bound in the supertype and replacing them with universally quantified representatives, written like !1. We call these regions "placeholder regions" – they represent, basically, "some unknown region".

Once we've done that replacement, we have the following relation:

fn(&'static u32) <: fn(&'!1 u32)

The key idea here is that this unknown region '!1 is not related to any other regions. So if we can prove that the subtyping relationship is true for '!1, then it ought to be true for any region, which is what we wanted.

So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship (fn arguments are contravariant, so we swap the left and right here):

&'!1 u32 <: &'static u32

According to the basic subtyping rules for a reference, this will be true if '!1: 'static. That is – if "some unknown region !1" lives outlives 'static. Now, this might be true – after all, '!1 could be 'static – but we don't know that it's true. So this should yield up an error (eventually).

What is a universe

In the previous section, we introduced the idea of a placeholder region, and we denoted it !1. We call this number 1 the universe index. The idea of a "universe" is that it is a set of names that are in scope within some type or at some point. Universes are formed into a tree, where each child extends its parents with some new names. So the root universe conceptually contains global names, such as the the lifetime 'static or the type i32. In the compiler, we also put generic type parameters into this root universe (in this sense, there is not just one root universe, but one per item). So consider this function bar:

struct Foo { }

fn bar<'a, T>(t: &'a T) {
    ...
}

Here, the root universe would consist of the lifetimes 'static and 'a. In fact, although we're focused on lifetimes here, we can apply the same concept to types, in which case the types Foo and T would be in the root universe (along with other global types, like i32). Basically, the root universe contains all the names that appear free in the body of bar.

Now let's extend bar a bit by adding a variable x:

fn bar<'a, T>(t: &'a T) {
    let x: for<'b> fn(&'b u32) = ...;
}

Here, the name 'b is not part of the root universe. Instead, when we "enter" into this for<'b> (e.g., by replacing it with a placeholder), we will create a child universe of the root, let's call it U1:

U0 (root universe)
│
└─ U1 (child universe)

The idea is that this child universe U1 extends the root universe U0 with a new name, which we are identifying by its universe number: !1.

Now let's extend bar a bit by adding one more variable, y:

fn bar<'a, T>(t: &'a T) {
    let x: for<'b> fn(&'b u32) = ...;
    let y: for<'c> fn(&'b u32) = ...;
}

When we enter this type, we will again create a new universe, which we'll call U2. Its parent will be the root universe, and U1 will be its sibling:

U0 (root universe)
│
├─ U1 (child universe)
│
└─ U2 (child universe)

This implies that, while in U2, we can name things from U0 or U2, but not U1.

Giving existential variables a universe. Now that we have this notion of universes, we can use it to extend our type-checker and things to prevent illegal names from leaking out. The idea is that we give each inference (existential) variable – whether it be a type or a lifetime – a universe. That variable's value can then only reference names visible from that universe. So for example is a lifetime variable is created in U0, then it cannot be assigned a value of !1 or !2, because those names are not visible from the universe U0.

Representing universes with just a counter. You might be surprised to see that the compiler doesn't keep track of a full tree of universes. Instead, it just keeps a counter – and, to determine if one universe can see another one, it just checks if the index is greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see U2, because 0 >= 2 is false.

How can we get away with this? Doesn't this mean that we would allow U2 to also see U1? The answer is that, yes, we would, if that question ever arose. But because of the structure of our type checker etc, there is no way for that to happen. In order for something happening in the universe U1 to "communicate" with something happening in U2, they would have to have a shared inference variable X in common. And because everything in U1 is scoped to just U1 and its children, that inference variable X would have to be in U0. And since X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest to see by using a kind of generic "logic" example:

exists<X> {
   forall<Y> { ... /* Y is in U1 ... */ }
   forall<Z> { ... /* Z is in U2 ... */ }
}

Here, the only way for the two foralls to interact would be through X, but neither Y nor Z are in scope when X is declared, so its value cannot reference either of them.

Universes and placeholder region elements

But where does that error come from? The way it happens is like this. When we are constructing the region inference context, we can tell from the type inference context how many placeholder variables exist (the InferCtxt has an internal counter). For each of those, we create a corresponding universal region variable !n and a "region element" placeholder(n). This corresponds to "some unknown set of other elements". The value of !n is {placeholder(n)}.

At the same time, we also give each existential variable a universe (also taken from the InferCtxt). This universe determines which placeholder elements may appear in its value: For example, a variable in universe U3 may name placeholder(1), placeholder(2), and placeholder(3), but not placeholder(4). Note that the universe of an inference variable controls what region elements can appear in its value; it does not say region elements will appear.

Placeholders and outlives constraints

In the region inference engine, outlives constraints have the form:

V1: V2 @ P

where V1 and V2 are region indices, and hence map to some region variable (which may be universally or existentially quantified). The P here is a "point" in the control-flow graph; it's not important for this section. This variable will have a universe, so let's call those universes U(V1) and U(V2) respectively. (Actually, the only one we are going to care about is U(V1).)

When we encounter this constraint, the ordinary procedure is to start a DFS from P. We keep walking so long as the nodes we are walking are present in value(V2) and we add those nodes to value(V1). If we reach a return point, we add in any end(X) elements. That part remains unchanged.

But then after that we want to iterate over the placeholder placeholder(x) elements in V2 (each of those must be visible to U(V2), but we should be able to just assume that is true, we don't have to check it). We have to ensure that value(V1) outlives each of those placeholder elements.

Now there are two ways that could happen. First, if U(V1) can see the universe x (i.e., x <= U(V1)), then we can just add placeholder(x) to value(V1) and be done. But if not, then we have to approximate: we may not know what set of elements placeholder(x) represents, but we should be able to compute some sort of upper bound B for it – some region B that outlives placeholder(x). For now, we'll just use 'static for that (since it outlives everything) – in the future, we can sometimes be smarter here (and in fact we have code for doing this already in other contexts). Moreover, since 'static is in the root universe U0, we know that all variables can see it – so basically if we find that value(V2) contains placeholder(x) for some universe x that V1 can't see, then we force V1 to 'static.

Extending the "universal regions" check

After all constraints have been propagated, the NLL region inference has one final check, where it goes over the values that wound up being computed for each universal region and checks that they did not get 'too large'. In our case, we will go through each placeholder region and check that it contains only the placeholder(u) element it is known to outlive. (Later, we might be able to know that there are relationships between two placeholder regions and take those into account, as we do for universal regions from the fn signature.)

Put another way, the "universal regions" check can be considered to be checking constraints like:

{placeholder(1)}: V1

where {placeholder(1)} is like a constant set, and V1 is the variable we made to represent the !1 region.

Back to our example

OK, so far so good. Now let's walk through what would happen with our first example:

fn(&'static u32) <: fn(&'!1 u32) @ P  // this point P is not imp't here

The region inference engine will create a region element domain like this:

{ CFG; end('static); placeholder(1) }
    ---  ------------  ------- from the universe `!1`
    |    'static is always in scope
    all points in the CFG; not especially relevant here

It will always create two universal variables, one representing 'static and one representing '!1. Let's call them Vs and V1. They will have initial values like so:

Vs = { CFG; end('static) } // it is in U0, so can't name anything else
V1 = { placeholder(1) }

From the subtyping constraint above, we would have an outlives constraint like

'!1: 'static @ P

To process this, we would grow the value of V1 to include all of Vs:

Vs = { CFG; end('static) }
V1 = { CFG; end('static), placeholder(1) }

At that point, constraint propagation is complete, because all the outlives relationships are satisfied. Then we would go to the "check universal regions" portion of the code, which would test that no universal region grew too large.

In this case, V1 did grow too large – it is not known to outlive end('static), nor any of the CFG – so we would report an error.

Another example

What about this subtyping relationship?

for<'a> fn(&'a u32, &'a u32)
    <:
for<'b, 'c> fn(&'b u32, &'c u32)

Here we would replace the bound region in the supertype with a placeholder, as before, yielding:

for<'a> fn(&'a u32, &'a u32)
    <:
fn(&'!1 u32, &'!2 u32)

then we instantiate the variable on the left-hand side with an existential in universe U2, yielding the following (?n is a notation for an existential variable):

fn(&'?3 u32, &'?3 u32)
    <:
fn(&'!1 u32, &'!2 u32)

Then we break this down further:

&'!1 u32 <: &'?3 u32
&'!2 u32 <: &'?3 u32

and even further, yield up our region constraints:

'!1: '?3
'!2: '?3

Note that, in this case, both '!1 and '!2 have to outlive the variable '?3, but the variable '?3 is not forced to outlive anything else. Therefore, it simply starts and ends as the empty set of elements, and hence the type-check succeeds here.

(This should surprise you a little. It surprised me when I first realized it. We are saying that if we are a fn that needs both of its arguments to have the same region, we can accept being called with arguments with two distinct regions. That seems intuitively unsound. But in fact, it's fine, as I tried to explain in this issue on the Rust issue tracker long ago. The reason is that even if we get called with arguments of two distinct lifetimes, those two lifetimes have some intersection (the call itself), and that intersection can be our value of 'a that we use as the common lifetime of our arguments. -nmatsakis)

Final example

Let's look at one last example. We'll extend the previous one to have a return type:

for<'a> fn(&'a u32, &'a u32) -> &'a u32
    <:
for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32

Despite seeming very similar to the previous example, this case is going to get an error. That's good: the problem is that we've gone from a fn that promises to return one of its two arguments, to a fn that is promising to return the first one. That is unsound. Let's see how it plays out.

First, we replace the bound region in the supertype with a placeholder:

for<'a> fn(&'a u32, &'a u32) -> &'a u32
    <:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32

Then we instantiate the subtype with existentials (in U2):

fn(&'?3 u32, &'?3 u32) -> &'?3 u32
    <:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32

And now we create the subtyping relationships:

&'!1 u32 <: &'?3 u32 // arg 1
&'!2 u32 <: &'?3 u32 // arg 2
&'?3 u32 <: &'!1 u32 // return type

And finally the outlives relationships. Here, let V1, V2, and V3 be the variables we assign to !1, !2, and ?3 respectively:

V1: V3
V2: V3
V3: V1

Those variables will have these initial values:

V1 in U1 = {placeholder(1)}
V2 in U2 = {placeholder(2)}
V3 in U2 = {}

Now because of the V3: V1 constraint, we have to add placeholder(1) into V3 (and indeed it is visible from V3), so we get:

V3 in U2 = {placeholder(1)}

then we have this constraint V2: V3, so we wind up having to enlarge V2 to include placeholder(1) (which it can also see):

V2 in U2 = {placeholder(1), placeholder(2)}

Now constraint propagation is done, but when we check the outlives relationships, we find that V2 includes this new element placeholder(1), so we report an error.

Constant Evaluation

Constant evaluation is the process of computing values at compile time. For a specific item (constant/static/array length) this happens after the MIR for the item is borrow-checked and optimized. In many cases trying to const evaluate an item will trigger the computation of its MIR for the first time.

Prominent examples are

  • The initializer of a static
  • Array length
    • needs to be known to reserve stack or heap space
  • Enum variant discriminants
    • needs to be known to prevent two variants from having the same discriminant
  • Patterns
    • need to be known to check for overlapping patterns

Additionally constant evaluation can be used to reduce the workload or binary size at runtime by precomputing complex operations at compiletime and only storing the result.

Constant evaluation can be done by calling the const_eval query of TyCtxt.

The const_eval query takes a ParamEnv of environment in which the constant is evaluated (e.g. the function within which the constant is used) and a GlobalId. The GlobalId is made up of an Instance referring to a constant or static or of an Instance of a function and an index into the function's Promoted table.

Constant evaluation returns a Result with either the error, or the simplest representation of the constant. "simplest" meaning if it is representable as an integer or fat pointer, it will directly yield the value (via Value::ByVal or Value::ByValPair), instead of referring to the miri virtual memory allocation (via Value::ByRef). This means that the const_eval function cannot be used to create miri-pointers to the evaluated constant or static. If you need that, you need to directly work with the functions in src/librustc_mir/const_eval.rs.

Miri

Miri (MIR Interpreter) is a virtual machine for executing MIR without compiling to machine code. It is usually invoked via tcx.const_eval.

If you start out with a constant


# #![allow(unused_variables)]
#fn main() {
const FOO: usize = 1 << 12;
#}

rustc doesn't actually invoke anything until the constant is either used or placed into metadata.

Once you have a use-site like

type Foo = [u8; FOO - 42];

The compiler needs to figure out the length of the array before being able to create items that use the type (locals, constants, function arguments, ...).

To obtain the (in this case empty) parameter environment, one can call let param_env = tcx.param_env(length_def_id);. The GlobalId needed is

let gid = GlobalId {
    promoted: None,
    instance: Instance::mono(length_def_id),
};

Invoking tcx.const_eval(param_env.and(gid)) will now trigger the creation of the MIR of the array length expression. The MIR will look something like this:

const Foo::{{initializer}}: usize = {
    let mut _0: usize;                   // return pointer
    let mut _1: (usize, bool);

    bb0: {
        _1 = CheckedSub(const Unevaluated(FOO, Slice([])), const 42usize);
        assert(!(_1.1: bool), "attempt to subtract with overflow") -> bb1;
    }

    bb1: {
        _0 = (_1.0: usize);
        return;
    }
}

Before the evaluation, a virtual memory location (in this case essentially a vec![u8; 4] or vec![u8; 8]) is created for storing the evaluation result.

At the start of the evaluation, _0 and _1 are Value::ByVal(PrimVal::Undef). When the initialization of _1 is invoked, the value of the FOO constant is required, and triggers another call to tcx.const_eval, which will not be shown here. If the evaluation of FOO is successful, 42 will be subtracted by its value 4096 and the result stored in _1 as Value::ByValPair(PrimVal::Bytes(4054), PrimVal::Bytes(0)). The first part of the pair is the computed value, the second part is a bool that's true if an overflow happened.

The next statement asserts that said boolean is 0. In case the assertion fails, its error message is used for reporting a compile-time error.

Since it does not fail, Value::ByVal(PrimVal::Bytes(4054)) is stored in the virtual memory was allocated before the evaluation. _0 always refers to that location directly.

After the evaluation is done, the virtual memory allocation is interned into the TyCtxt. Future evaluations of the same constants will not actually invoke miri, but just extract the value from the interned allocation.

The tcx.const_eval function has one additional feature: it will not return a ByRef(interned_allocation_id), but a ByVal(computed_value) if possible. This makes using the result much more convenient, as no further queries need to be executed in order to get at something as simple as a usize.

Datastructures

Miri's core datastructures can be found in librustc/mir/interpret. This is mainly the error enum and the Value and PrimVal types. A Value can be either ByVal (a single PrimVal), ByValPair (two PrimVals, usually fat pointers or two element tuples) or ByRef, which is used for anything else and refers to a virtual allocation. These allocations can be accessed via the methods on tcx.interpret_interner.

If you are expecting a numeric result, you can use unwrap_u64 (panics on anything that can't be representad as a u64) or to_raw_bits which results in an Option<u128> yielding the ByVal if possible.

Allocations

A miri allocation is either a byte sequence of the memory or an Instance in the case of function pointers. Byte sequences can additionally contain relocations that mark a group of bytes as a pointer to another allocation. The actual bytes at the relocation refer to the offset inside the other allocation.

These allocations exist so that references and raw pointers have something to point to. There is no global linear heap in which things are allocated, but each allocation (be it for a local variable, a static or a (future) heap allocation) gets its own little memory with exactly the required size. So if you have a pointer to an allocation for a local variable a, there is no possible (no matter how unsafe) operation that you can do that would ever change said pointer to a pointer to b.

Interpretation

Although the main entry point to constant evaluation is the tcx.const_eval query, there are additional functions in librustc_mir/const_eval.rs that allow accessing the fields of a Value (ByRef or otherwise). You should never have to access an Allocation directly except for translating it to the compilation target (at the moment just LLVM).

Miri starts by creating a virtual stack frame for the current constant that is being evaluated. There's essentially no difference between a constant and a function with no arguments, except that constants do not allow local (named) variables at the time of writing this guide.

A stack frame is defined by the Frame type in librustc_mir/interpret/eval_context.rs and contains all the local variables memory (None at the start of evaluation). Each frame refers to the evaluation of either the root constant or subsequent calls to const fn. The evaluation of another constant simply calls tcx.const_eval, which produces an entirely new and independent stack frame.

The frames are just a Vec<Frame>, there's no way to actually refer to a Frame's memory even if horrible shenigans are done via unsafe code. The only memory that can be referred to are Allocations.

Miri now calls the step method (in librustc_mir/interpret/step.rs ) until it either returns an error or has no further statements to execute. Each statement will now initialize or modify the locals or the virtual memory referred to by a local. This might require evaluating other constants or statics, which just recursively invokes tcx.const_eval.

Parameter Environment

When working with associated and/or or generic items (types, constants, functions/methods) it is often relevant to have more information about the Self or generic parameters. Trait bounds and similar information is encoded in the ParamEnv. Often this is not enough information to obtain things like the type's Layout, but you can do all kinds of other checks on it (e.g. whether a type implements Copy) or you can evaluate an associated constant whose value does not depend on anything from the parameter environment.

For example if you have a function


# #![allow(unused_variables)]
#fn main() {
fn foo<T: Copy>(t: T) {
}
#}

the parameter environment for that function is [T: Copy]. This means any evaluation within this function will, when accessing the type T, know about its Copy bound via the parameter environment.

Although you can obtain a valid ParamEnv for any item via tcx.param_env(def_id), this ParamEnv can be too generic for your use case. Using the ParamEnv from the surrounding context can allow you to evaluate more things.

Another great thing about ParamEnv is that you can use it to bundle the thing depending on generic parameters (e.g. a Ty) by calling param_env.and(ty). This will produce a ParamEnvAnd<Ty>, making clear that you should probably not be using the inner value without taking care to also use the ParamEnv.

Code generation

Code generation or "codegen" is the part of the compiler that actually generates an executable binary. rustc uses LLVM for code generation.

NOTE: If you are looking for hints on how to debug code generation bugs, please see this section of the debugging chapter.

What is LLVM?

All of the preceding chapters of this guide have one thing in common: we never generated any executable machine code at all! With this chapter, all of that changes.

Like most compilers, rustc is composed of a "frontend" and a "backend". The "frontend" is responsible for taking raw source code, checking it for correctness, and getting it into a format X from which we can generate executable machine code. The "backend" then takes that format X and produces (possibly optimized) executable machine code for some platform. All of the previous chapters deal with rustc's frontend.

rustc's backend is LLVM, "a collection of modular and reusable compiler and toolchain technologies". In particular, the LLVM project contains a pluggable compiler backend (also called "LLVM"), which is used by many compiler projects, including the clang C compiler and our beloved rustc.

LLVM's "format X" is called LLVM IR. It is basically assembly code with additional low-level types and annotations added. These annotations are helpful for doing optimizations on the LLVM IR and outputted machine code. The end result of all this is (at long last) something executable (e.g. an ELF object or wasm).

There are a few benefits to using LLVM:

  • We don't have to write a whole compiler backend. This reduces implementation and maintenance burden.
  • We benefit from the large suite of advanced optimizations that the LLVM project has been collecting.
  • We automatically can compile Rust to any of the platforms for which LLVM has support. For example, as soon as LLVM added support for wasm, voila! rustc, clang, and a bunch of other languages were able to compile to wasm! (Well, there was some extra stuff to be done, but we were 90% there anyway).
  • We and other compiler projects benefit from each other. For example, when the Spectre and Meltdown security vulnerabilities were discovered, only LLVM needed to be patched.

Generating LLVM IR

TODO

Updating LLVM

The Rust compiler uses LLVM as its primary codegen backend today, and naturally we want to at least occasionally update this dependency! Currently we do not have a strict policy about when to update LLVM or what it can be updated to, but a few guidelines are applied:

  • We try to always support the latest released version of LLVM
  • We try to support the "last few" versions of LLVM (how many is changing over time)
  • We allow moving to arbitrary commits during development.
  • Strongly prefer to upstream all patches to LLVM before including them in rustc.

This policy may change over time (or may actually start to exist as a formal policy!), but for now these are rough guidelines!

Why update LLVM?

There are two primary reasons nowadays that we want to update LLVM in one way or another:

  • First, a bug could have been fixed! Often we find bugs in the compiler and fix them upstream in LLVM. We'll want to pull fixes back to the compiler itself as they're merged upstream.

  • Second, a new feature may be avaiable in LLVM that we want to use in rustc, but we don't want to wait for a full LLVM release to test it out.

Each of these reasons has a different strategy for updating LLVM, and we'll go over both in detail here.

Bugfix Updates

For updates of LLVM that typically just update a bug, we cherry-pick the bugfix to the branch we're already using. The steps for this are:

  1. Make sure the bugfix is in upstream LLVM.
  2. Identify the branch that rustc is currently using. The src/llvm submodule is always pinned to a branch of the rust-lang/llvm repository.
  3. Fork the rust-lang/llvm repository
  4. Check out the appropriate branch (typically named rust-llvm-release-*)
  5. Cherry-pick the upstream commit onto the branch
  6. Push this branch to your fork
  7. Send a Pull Request to rust-lang/llvm to the same branch as before
  8. Wait for the PR to be merged
  9. Send a PR to rust-lang/rust updating the src/llvm submodule with your bugfix
  10. Wait for PR to be merged

The tl;dr; is that we can cherry-pick bugfixes at any time and pull them back into the rust-lang/llvm branch that we're using, and getting it into the compiler is just updating the submodule via a PR!

Example PRs look like: #56313

Feature updates

Note that this is all information as applies to the current day in age. This process for updating LLVM changes with practically all LLVM updates, so this may be out of date!

Unlike bugfixes, updating to pick up a new feature of LLVM typically requires a lot more work. This is where we can't reasonably cherry-pick commits backwards so we need to do a full update. There's a lot of stuff to do here, so let's go through each in detail.

  1. Create new branches in all repositories for this update. Branches should be named rust-llvm-release-X-Y-Z-vA where X.Y.Z is the LLVM version and A is just increasing based on if there's previous branches of this name. All repositories here should be branched at the same time from the upstream LLVM projects, we currently use https://github.com/llvm-mirror repositories. The list of repositories that need a new branch are:

    • rust-lang/llvm
    • rust-lang/compiler-rt
    • rust-lang/lld
    • rust-lang-nursery/lldb
    • rust-lang-nursery/clang
  2. Apply Rust-specific patches to LLVM repositories. All features and bugfixes are upstream, but there's often some weird build-related patches that don't make sense to upstream which we have on our repositories. These patches are typically the latest patches on the branch. All repositories, except clang, currently have Rust-specific patches.

  3. Update the compiler-rt submodule in the rust-lang-nursery/compiler-builtins repository. Push this update to a rust-llvm-release-* branch of the compiler-builtins repository.

  4. Prepare a commit to rust-lang/rust

  • Update src/llvm

  • Update src/tools/lld

  • Update src/tools/lldb

  • Update src/tools/clang

  • Update `src/libcompiler_builtins

  • Edit src/rustllvm/llvm-rebuild-trigger to update its contents

  1. Build your commit. Make sure you've committed the previous changes to ensure submodule updates aren't reverted. Some commands you should execute are:

    • ./x.py build src/llvm - test that LLVM still builds
    • ./x.py build src/tools/lld - same for LLD
    • ./x.py build - build the rest of rustc

    You'll likely need to update src/rustllvm/*.cpp to compile with updated LLVM bindings. Note that you should use #ifdef and such to ensure that the bindings still compile on older LLVM versions.

  2. Test for regressions across other platforms. LLVM often has at least one bug for non-tier-1 architectures, so it's good to do some more testing before sending this to bors! If you're low on resources you can send the PR as-is now to bors, though, and it'll get tested anyway.

    Ideally, build LLVM and test it on a few platforms:

    • Linux
    • OSX
    • Windows

    and afterwards run some docker containers that CI also does:

    • ./src/ci/docker/run.sh wasm32-unknown
    • ./src/ci/docker/run.sh arm-android
    • ./src/ci/docker/run.sh dist-various-1
    • ./src/ci/docker/run.sh dist-various-2
    • ./src/ci/docker/run.sh armhf-gnu
  3. Send a PR! Hopefully it's smooth sailing from here :).

For prior art, previous LLVM updates look like #55835 #47828

Caveats and gotchas

Ideally the above instructions are pretty smooth, but here's some caveats to keep in mind while going through them:

  • LLVM bugs are hard to find, don't hesitate to ask for help! Bisection is definitely your friend here (yes LLVM takes forever to build, yet bisection is still your friend)
  • Updating LLDB has some Rust-specific patches currently that aren't upstream. If you have difficulty @tromey can likely help out.
  • If you've got general questions, @alexcrichton can help you out.
  • Creating branches is a privileged operation on GitHub, so you'll need someone with write access to create the branches for you most likely.

Emitting Diagnostics

A lot of effort has been put into making rustc have great error messages. This chapter is about how to emit compile errors and lints from the compiler.

Span

Span is the primary data structure in rustc used to represent a location in the code being compiled. Spans are attached to most constructs in HIR and MIR, allowing for more informative error reporting.

A Span can be looked up in a SourceMap to get a "snippet" useful for displaying errors with span_to_snippet and other similar methods on the SourceMap.

Error messages

The rustc_errors crate defines most of the utilities used for reporting errors.

Session and ParseSess have methods (or fields with methods) that allow reporting errors. These methods usually have names like span_err or struct_span_err or span_warn, etc... There are lots of them; they emit different types of "errors", such as warnings, errors, fatal errors, suggestions, etc.

In general, there are two class of such methods: ones that emit an error directly and ones that allow finer control over what to emit. For example, span_err emits the given error message at the given Span, but struct_span_err instead returns a DiagnosticBuilder.

DiagnosticBuilder allows you to add related notes and suggestions to an error before emitting it by calling the emit method. (Failing to either emit or cancel a DiagnosticBuilder will result in an ICE.) See the docs for more info on what you can do.

// Get a DiagnosticBuilder. This does _not_ emit an error yet.
let mut err = sess.struct_span_err(sp, "oh no! this is an error!");

// In some cases, you might need to check if `sp` is generated by a macro to
// avoid printing weird errors about macro-generated code.

if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
    // Use the snippet to generate a suggested fix
    err.span_suggestion(suggestion_sp, "try using a qux here", format!("qux {}", snip));
} else {
    // If we weren't able to generate a snippet, then emit a "help" message
    // instead of a concrete "suggestion". In practice this is unlikely to be
    // reached.
    err.span_help(suggestion_sp, "you could use a qux here instead");
}

// emit the error
err.emit();

Suggestions

In addition to telling the user exactly why their code is wrong, it's oftentimes furthermore possible to tell them how to fix it. To this end, DiagnosticBuilder offers a structured suggestions API, which formats code suggestions pleasingly in the terminal, or (when the --error-format json flag is passed) as JSON for consumption by tools, most notably the Rust Language Server and rustfix.

Not all suggestions should be applied mechanically. Use the span_suggestion_with_applicability method of DiagnosticBuilder to make a suggestion while providing a hint to tools whether the suggestion is mechanically applicable or not.

For example, to make our qux suggestion machine-applicable, we would do:

let mut err = sess.struct_span_err(sp, "oh no! this is an error!");

if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
    // Add applicability info!
    err.span_suggestion_with_applicability(
        suggestion_sp,
        "try using a qux here",
        format!("qux {}", snip),
        Applicability::MachineApplicable,
    );
} else {
    err.span_help(suggestion_sp, "you could use a qux here instead");
}

err.emit();

This might emit an error like

$ rustc mycode.rs
error[E0999]: oh no! this is an error!
 --> mycode.rs:3:5
  |
3 |     sad()
  |     ^ help: try using a qux here: `qux sad()`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0999`.

In some cases, like when the suggestion spans multiple lines or when there are multiple suggestions, the suggestions are displayed on their own:

error[E0999]: oh no! this is an error!
 --> mycode.rs:3:5
  |
3 |     sad()
  |     ^
help: try using a qux here:
  |
3 |     qux sad()
  |     ^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0999`.

There are a few other Applicability possibilities:

  • MachineApplicable: Can be applied mechanically.
  • HasPlaceholders: Cannot be applied mechanically because it has placeholder text in the suggestions. For example, "Try adding a type: `let x: <type>`".
  • MaybeIncorrect: Cannot be applied mechanically because the suggestion may or may not be a good one.
  • Unspecified: Cannot be applied mechanically because we don't know which of the above cases it falls into.

Lints

The compiler linting infrastructure is defined in the rustc::lint module.

Declaring a lint

The built-in compiler lints are defined in the rustc_lint crate.

Each lint is defined as a struct that implements the LintPass trait. The trait implementation allows you to check certain syntactic constructs the linter walks the source code. You can then choose to emit lints in a very similar way to compile errors. Finally, you register the lint to actually get it to be run by the compiler by using the declare_lint! macro.

For example, the following lint checks for uses of while true { ... } and suggests using loop { ... } instead.

// Declare a lint called `WHILE_TRUE`
declare_lint! {
    WHILE_TRUE,

    // warn-by-default
    Warn,

    // This string is the lint description
    "suggest using `loop { }` instead of `while true { }`"
}

// Define a struct and `impl LintPass` for it.
#[derive(Copy, Clone)]
pub struct WhileTrue;

impl LintPass for WhileTrue {
    fn get_lints(&self) -> LintArray {
        lint_array!(WHILE_TRUE)
    }
}

// LateLintPass has lots of methods. We only override the definition of
// `check_expr` for this lint because that's all we need, but you could
// override other methods for your own lint. See the rustc docs for a full
// list of methods.
impl<'a, 'tcx> LateLintPass<'a, 'tcx> for WhileTrue {
    fn check_expr(&mut self, cx: &LateContext, e: &hir::Expr) {
        if let hir::ExprWhile(ref cond, ..) = e.node {
            if let hir::ExprLit(ref lit) = cond.node {
                if let ast::LitKind::Bool(true) = lit.node {
                    if lit.span.ctxt() == SyntaxContext::empty() {
                        let msg = "denote infinite loops with `loop { ... }`";
                        let condition_span = cx.tcx.sess.source_map().def_span(e.span);
                        let mut err = cx.struct_span_lint(WHILE_TRUE, condition_span, msg);
                        err.span_suggestion_short(condition_span, "use `loop`", "loop".to_owned());
                        err.emit();
                    }
                }
            }
        }
    }
}

Edition-gated Lints

Sometimes we want to change the behavior of a lint in a new edition. To do this, we just add the transition to our invocation of declare_lint!:

declare_lint! {
    pub ANONYMOUS_PARAMETERS,
    Allow,
    "detects anonymous parameters",
    Edition::Edition2018 => Warn,
}

This makes the ANONYMOUS_PARAMETERS lint allow-by-default in the 2015 edition but warn-by-default in the 2018 edition.

Lints that represent an incompatibility (i.e. error) in the upcoming edition should also be registered as FutureIncompatibilityLints in register_builtins function in rustc_lint::lib.

Lint Groups

Lints can be turned on in groups. These groups are declared in the register_builtins function in rustc_lint::lib. The add_lint_group! macro is used to declare a new group.

For example,

    add_lint_group!(sess,
                    "nonstandard_style",
                    NON_CAMEL_CASE_TYPES,
                    NON_SNAKE_CASE,
                    NON_UPPER_CASE_GLOBALS);

This defines the nonstandard_style group which turns on the listed lints. A user can turn on these lints with a !#[warn(nonstandard_style)] attribute in the source code, or by passing -W nonstandard-style on the command line.

Linting early in the compiler

On occasion, you may need to define a lint that runs before the linting system has been initialized (e.g. during parsing or macro expansion). This is problematic because we need to have computed lint levels to know whether we should emit a warning or an error or nothing at all.

To solve this problem, we buffer the lints until the linting system is processed. Session and ParseSess both have buffer_lint methods that allow you to buffer a lint for later. The linting system automatically takes care of handling buffered lints later.

Thus, to define a lint that runs early in the compilation, one defines a lint like normal but invokes the lint with buffer_lint.

Linting even earlier in the compiler

The parser (libsyntax) is interesting in that it cannot have dependencies on any of the other librustc* crates. In particular, it cannot depend on librustc::lint or librustc_lint, where all of the compiler linting infrastructure is defined. That's troublesome!

To solve this, libsyntax defines its own buffered lint type, which ParseSess::buffer_lint uses. After macro expansion, these buffered lints are then dumped into the Session::buffered_lints used by the rest of the compiler.

Usage for buffered lints in libsyntax is pretty much the same as the rest of the compiler with one exception because we cannot import the LintIds for lints we want to emit. Instead, the BufferedEarlyLintId type is used. If you are defining a new lint, you will want to add an entry to this enum. Then, add an appropriate mapping to the body of Lint::from_parser_lint_id.

Appendix A: A tutorial on creating a drop-in replacement for rustc

Note: This is a copy of @nrc's amazing stupid-stats. You should find a copy of the code on the GitHub repository although due to the compiler's constantly evolving nature, there is no guarantee it'll compile on the first go.

Many tools benefit from being a drop-in replacement for a compiler. By this, I mean that any user of the tool can use mytool in all the ways they would normally use rustc - whether manually compiling a single file or as part of a complex make project or Cargo build, etc. That could be a lot of work; rustc, like most compilers, takes a large number of command line arguments which can affect compilation in complex and interacting ways. Emulating all of this behaviour in your tool is annoying at best, especically if you are making many of the same calls into librustc that the compiler is.

The kind of things I have in mind are tools like rustdoc or a future rustfmt. These want to operate as closely as possible to real compilation, but have totally different outputs (documentation and formatted source code, respectively). Another use case is a customised compiler. Say you want to add a custom code generation phase after macro expansion, then creating a new tool should be easier than forking the compiler (and keeping it up to date as the compiler evolves).

I have gradually been trying to improve the API of librustc to make creating a drop-in tool easier to produce (many others have also helped improve these interfaces over the same time frame). It is now pretty simple to make a tool which is as close to rustc as you want it to be. In this tutorial I'll show how.

Note/warning, everything I talk about in this tutorial is internal API for rustc. It is all extremely unstable and likely to change often and in unpredictable ways. Maintaining a tool which uses these APIs will be non- trivial, although hopefully easier than maintaining one that does similar things without using them.

This tutorial starts with a very high level view of the rustc compilation process and of some of the code that drives compilation. Then I'll describe how that process can be customised. In the final section of the tutorial, I'll go through an example - stupid-stats - which shows how to build a drop-in tool.

Overview of the compilation process

Compilation using rustc happens in several phases. We start with parsing, this includes lexing. The output of this phase is an AST (abstract syntax tree). There is a single AST for each crate (indeed, the entire compilation process operates over a single crate). Parsing abstracts away details about individual files which will all have been read in to the AST in this phase. At this stage the AST includes all macro uses, attributes will still be present, and nothing will have been eliminated due to cfgs.

The next phase is configuration and macro expansion. This can be thought of as a function over the AST. The unexpanded AST goes in and an expanded AST comes out. Macros and syntax extensions are expanded, and cfg attributes will cause some code to disappear. The resulting AST won't have any macros or macro uses left in.

The code for these first two phases is in libsyntax.

After this phase, the compiler allocates ids to each node in the AST (technically not every node, but most of them). If we are writing out dependencies, that happens now.

The next big phase is analysis. This is the most complex phase and uses the bulk of the code in rustc. This includes name resolution, type checking, borrow checking, type and lifetime inference, trait selection, method selection, linting, and so forth. Most error detection is done in this phase (although parse errors are found during parsing). The 'output' of this phase is a bunch of side tables containing semantic information about the source program. The analysis code is in librustc and a bunch of other crates with the 'librustc_' prefix.

Next is translation, this translates the AST (and all those side tables) into LLVM IR (intermediate representation). We do this by calling into the LLVM libraries, rather than actually writing IR directly to a file. The code for this is in librustc_trans.

The next phase is running the LLVM backend. This runs LLVM's optimisation passes on the generated IR and then generates machine code. The result is object files. This phase is all done by LLVM, it is not really part of the rust compiler. The interface between LLVM and rustc is in librustc_llvm.

Finally, we link the object files into an executable. Again we outsource this to other programs and it's not really part of the rust compiler. The interface is in librustc_back (which also contains some things used primarily during translation).

NOTE: librustc_trans and librustc_back no longer exist, and we don't translate AST or HIR directly to LLVM IR anymore. Instead, see librustc_codegen_llvm and librustc_codegen_utils.

All these phases are coordinated by the driver. To see the exact sequence, look at the compile_input function in librustc_driver. The driver handles all the highest level coordination of compilation - 1. handling command-line arguments 2. maintaining compilation state (primarily in the Session) 3. calling the appropriate code to run each phase of compilation 4. handles high level coordination of pretty printing and testing. To create a drop-in compiler replacement or a compiler replacement, we leave most of compilation alone and customise the driver using its APIs.

The driver customisation APIs

There are two primary ways to customise compilation - high level control of the driver using CompilerCalls and controlling each phase of compilation using a CompileController. The former lets you customise handling of command line arguments etc., the latter lets you stop compilation early or execute code between phases.

CompilerCalls

CompilerCalls is a trait that you implement in your tool. It contains a fairly ad-hoc set of methods to hook in to the process of processing command line arguments and driving the compiler. For details, see the comments in librustc_driver/lib.rs. I'll summarise the methods here.

early_callback and late_callback let you call arbitrary code at different points - early is after command line arguments have been parsed, but before anything is done with them; late is pretty much the last thing before compilation starts, i.e., after all processing of command line arguments, etc. is done. Currently, you get to choose whether compilation stops or continues at each point, but you don't get to change anything the driver has done. You can record some info for later, or perform other actions of your own.

some_input and no_input give you an opportunity to modify the primary input to the compiler (usually the input is a file containing the top module for a crate, but it could also be a string). You could record the input or perform other actions of your own.

Ignore parse_pretty, it is unfortunate and hopefully will get improved. There is a default implementation, so you can pretend it doesn't exist.

build_controller returns a CompileController object for more fine-grained control of compilation, it is described next.

We might add more options in the future.

CompilerController

CompilerController is a struct consisting of PhaseControllers and flags. Currently, there is only flag, make_glob_map which signals whether to produce a map of glob imports (used by save-analysis and potentially other tools). There are probably flags in the session that should be moved here.

There is a PhaseController for each of the phases described in the above summary of compilation (and we could add more in the future for finer-grained control). They are all after_ a phase because they are checked at the end of a phase (again, that might change), e.g., CompilerController::after_parse controls what happens immediately after parsing (and before macro expansion).

Each PhaseController contains a flag called stop which indicates whether compilation should stop or continue, and a callback to be executed at the point indicated by the phase. The callback is called whether or not compilation continues.

Information about the state of compilation is passed to these callbacks in a CompileState object. This contains all the information the compiler has. Note that this state information is immutable - your callback can only execute code using the compiler state, it can't modify the state. (If there is demand, we could change that). The state available to a callback depends on where during compilation the callback is called. For example, after parsing there is an AST but no semantic analysis (because the AST has not been analysed yet). After translation, there is translation info, but no AST or analysis info (since these have been consumed/forgotten).

An example - stupid-stats

Our example tool is very simple, it simply collects some simple and not very useful statistics about a program; it is called stupid-stats. You can find the (more heavily commented) complete source for the example on Github. To build, just do cargo build. To run on a file foo.rs, do cargo run foo.rs (assuming you have a Rust program called foo.rs. You can also pass any command line arguments that you would normally pass to rustc). When you run it you'll see output similar to

In crate: foo,

Found 12 uses of `println!`;
The most common number of arguments is 1 (67% of all functions);
25% of functions have four or more arguments.

To make things easier, when we talk about functions, we're excluding methods and closures.

You can also use the executable as a drop-in replacement for rustc, because after all, that is the whole point of this exercise. So, however you use rustc in your makefile setup, you can use target/stupid (or whatever executable you end up with) instead. That might mean setting an environment variable or it might mean renaming your executable to rustc and setting your PATH. Similarly, if you're using Cargo, you'll need to rename the executable to rustc and set the PATH. Alternatively, you should be able to use multirust to get around all the PATH stuff (although I haven't actually tried that).

(Note that this example prints to stdout. I'm not entirely sure what Cargo does with stdout from rustc under different circumstances. If you don't see any output, try inserting a panic! after the println!s to error out, then Cargo should dump stupid-stats' stdout to Cargo's stdout).

Let's start with the main function for our tool, it is pretty simple:

fn main() {
    let args: Vec<_> = std::env::args().collect();
    rustc_driver::run_compiler(&args, &mut StupidCalls::new());
    std::env::set_exit_status(0);
}

The first line grabs any command line arguments. The second line calls the compiler driver with those arguments. The final line sets the exit code for the program.

The only interesting thing is the StupidCalls object we pass to the driver. This is our implementation of the CompilerCalls trait and is what will make this tool different from rustc.

StupidCalls is a mostly empty struct:

struct StupidCalls {
    default_calls: RustcDefaultCalls,
}

This tool is so simple that it doesn't need to store any data here, but usually you would. We embed a RustcDefaultCalls object to delegate to in our impl when we want exactly the same behaviour as the Rust compiler. Mostly you don't want to do that (or at least don't need to) in a tool. However, Cargo calls rustc with the --print file-names, so we delegate in late_callback and no_input to keep Cargo happy.

Most of the rest of the impl of CompilerCalls is trivial:

impl<'a> CompilerCalls<'a> for StupidCalls {
    fn early_callback(&mut self,
                        _: &getopts::Matches,
                        _: &config::Options,
                        _: &diagnostics::registry::Registry,
                        _: ErrorOutputType)
                      -> Compilation {
        Compilation::Continue
    }

    fn late_callback(&mut self,
                     t: &TransCrate,
                     m: &getopts::Matches,
                     s: &Session,
                     c: &CrateStore,
                     i: &Input,
                     odir: &Option<PathBuf>,
                     ofile: &Option<PathBuf>)
                     -> Compilation {
        self.default_calls.late_callback(t, m, s, c, i, odir, ofile);
        Compilation::Continue
    }

    fn some_input(&mut self,
                  input: Input,
                  input_path: Option<Path>)
                  -> (Input, Option<Path>) {
        (input, input_path)
    }

    fn no_input(&mut self,
                m: &getopts::Matches,
                o: &config::Options,
                odir: &Option<Path>,
                ofile: &Option<Path>,
                r: &diagnostics::registry::Registry)
                -> Option<(Input, Option<Path>)> {
        self.default_calls.no_input(m, o, odir, ofile, r);

        // This is not optimal error handling.
        panic!("No input supplied to stupid-stats");
    }

    fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> {
        ...
    }
}

We don't do anything for either of the callbacks, nor do we change the input if the user supplies it. If they don't, we just panic!, this is the simplest way to handle the error, but not very user-friendly, a real tool would give a constructive message or perform a default action.

In build_controller we construct our CompileController. We only want to parse, and we want to inspect macros before expansion, so we make compilation stop after the first phase (parsing). The callback after that phase is where the tool does it's actual work by walking the AST. We do that by creating an AST visitor and making it walk the AST from the top (the crate root). Once we've walked the crate, we print the stats we've collected:

fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> {
    // We mostly want to do what rustc does, which is what basic() will return.
    let mut control = driver::CompileController::basic();
    // But we only need the AST, so we can stop compilation after parsing.
    control.after_parse.stop = Compilation::Stop;

    // And when we stop after parsing we'll call this closure.
    // Note that this will give us an AST before macro expansions, which is
    // not usually what you want.
    control.after_parse.callback = box |state| {
        // Which extracts information about the compiled crate...
        let krate = state.krate.unwrap();

        // ...and walks the AST, collecting stats.
        let mut visitor = StupidVisitor::new();
        visit::walk_crate(&mut visitor, krate);

        // And finally prints out the stupid stats that we collected.
        let cratename = match attr::find_crate_name(&krate.attrs[]) {
            Some(name) => name.to_string(),
            None => String::from_str("unknown_crate"),
        };
        println!("In crate: {},\n", cratename);
        println!("Found {} uses of `println!`;", visitor.println_count);

        let (common, common_percent, four_percent) = visitor.compute_arg_stats();
        println!("The most common number of arguments is {} ({:.0}% of all functions);",
                 common, common_percent);
        println!("{:.0}% of functions have four or more arguments.", four_percent);
    };

    control
}

That is all it takes to create your own drop-in compiler replacement or custom compiler! For the sake of completeness I'll go over the rest of the stupid-stats tool.


# #![allow(unused_variables)]
#fn main() {
struct StupidVisitor {
    println_count: usize,
    arg_counts: Vec<usize>,
}
#}

The StupidVisitor struct just keeps track of the number of println!s it has seen and the count for each number of arguments. It implements syntax::visit::Visitor to walk the AST. Mostly we just use the default methods, these walk the AST taking no action. We override visit_item and visit_mac to implement custom behaviour when we walk into items (items include functions, modules, traits, structs, and so forth, we're only interested in functions) and macros:

impl<'v> visit::Visitor<'v> for StupidVisitor {
    fn visit_item(&mut self, i: &'v ast::Item) {
        match i.node {
            ast::Item_::ItemFn(ref decl, _, _, _, _) => {
                // Record the number of args.
                self.increment_args(decl.inputs.len());
            }
            _ => {}
        }

        // Keep walking.
        visit::walk_item(self, i)
    }

    fn visit_mac(&mut self, mac: &'v ast::Mac) {
        // Find its name and check if it is "println".
        let ast::Mac_::MacInvocTT(ref path, _, _) = mac.node;
        if path_to_string(path) == "println" {
            self.println_count += 1;
        }

        // Keep walking.
        visit::walk_mac(self, mac)
    }
}

The increment_args method increments the correct count in StupidVisitor::arg_counts. After we're done walking, compute_arg_stats does some pretty basic maths to come up with the stats we want about arguments.

What next?

These APIs are pretty new and have a long way to go until they're really good. If there are improvements you'd like to see or things you'd like to be able to do, let me know in a comment or GitHub issue. In particular, it's not clear to me exactly what extra flexibility is required. If you have an existing tool that would be suited to this setup, please try it out and let me know if you have problems.

It'd be great to see Rustdoc converted to using these APIs, if that is possible (although long term, I'd prefer to see Rustdoc run on the output from save- analysis, rather than doing its own analysis). Other parts of the compiler (e.g., pretty printing, testing) could be refactored to use these APIs internally (I already changed save-analysis to use CompilerController). I've been experimenting with a prototype rustfmt which also uses these APIs.

Appendix B: Background topics

This section covers a numbers of common compiler terms that arise in this guide. We try to give the general definition while providing some Rust-specific context.

What is a control-flow graph?

A control-flow graph is a common term from compilers. If you've ever used a flow-chart, then the concept of a control-flow graph will be pretty familiar to you. It's a representation of your program that exposes the underlying control flow in a very clear way.

A control-flow graph is structured as a set of basic blocks connected by edges. The key idea of a basic block is that it is a set of statements that execute "together" – that is, whenever you branch to a basic block, you start at the first statement and then execute all the remainder. Only at the end of the block is there the possibility of branching to more than one place (in MIR, we call that final statement the terminator):

bb0: {
    statement0;
    statement1;
    statement2;
    ...
    terminator;
}

Many expressions that you are used to in Rust compile down to multiple basic blocks. For example, consider an if statement:

a = 1;
if some_variable {
    b = 1;
} else {
    c = 1;
}
d = 1;

This would compile into four basic blocks:

BB0: {
    a = 1;
    if some_variable { goto BB1 } else { goto BB2 }
}

BB1: {
    b = 1;
    goto BB3;
}

BB2: {
    c = 1;
    goto BB3;
}

BB3: {
    d = 1;
    ...;
}

When using a control-flow graph, a loop simply appears as a cycle in the graph, and the break keyword translates into a path out of that cycle.

What is a dataflow analysis?

Static Program Analysis by Anders Møller and Michael I. Schwartzbach is an incredible resource!

to be written

What is "universally quantified"? What about "existentially quantified"?

to be written

What is co- and contra-variance?

Check out the subtyping chapter from the Rust Nomicon.

See the variance chapter of this guide for more info on how the type checker handles variance.

What is a "free region" or a "free variable"? What about "bound region"?

Let's describe the concepts of free vs bound in terms of program variables, since that's the thing we're most familiar with.

  • Consider this expression, which creates a closure: |a, b| a + b. Here, the a and b in a + b refer to the arguments that the closure will be given when it is called. We say that the a and b there are bound to the closure, and that the closure signature |a, b| is a binder for the names a and b (because any references to a or b within refer to the variables that it introduces).
  • Consider this expression: a + b. In this expression, a and b refer to local variables that are defined outside of the expression. We say that those variables appear free in the expression (i.e., they are free, not bound (tied up)).

So there you have it: a variable "appears free" in some expression/statement/whatever if it refers to something defined outside of that expressions/statement/whatever. Equivalently, we can then refer to the "free variables" of an expression – which is just the set of variables that "appear free".

So what does this have to do with regions? Well, we can apply the analogous concept to type and regions. For example, in the type &'a u32, 'a appears free. But in the type for<'a> fn(&'a u32), it does not.

Appendix C: Glossary

The compiler uses a number of...idiosyncratic abbreviations and things. This glossary attempts to list them and give you a few pointers for understanding them better.

Term Meaning
AST the abstract syntax tree produced by the syntax crate; reflects user syntax very closely.
binder a "binder" is a place where a variable or type is declared; for example, the <T> is a binder for the generic type parameter T in fn foo<T>(..), and |a|... is a binder for the parameter a. See the background chapter for more
bound variable a "bound variable" is one that is declared within an expression/term. For example, the variable a is bound within the closure expession |a|a * 2. See the background chapter for more
codegen the code to translate MIR into LLVM IR.
codegen unit when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
completeness completeness is a technical term in type theory. Completeness means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness").
control-flow graph a representation of the control-flow of a program; see the background chapter for more
CTFE Compile-Time Function Evaluation. This is the ability of the compiler to evaluate const fns at compile time. This is part of the compiler's constant evaluation system. (see more)
cx we tend to use "cx" as an abbreviation for context. See also tcx, infcx, etc.
DAG a directed acyclic graph is used during compilation to keep track of dependencies between queries. (see more)
data-flow analysis a static analysis that figures out what properties are true at each point in the control-flow of a program; see the background chapter for more
DefId an index identifying a definition (see librustc/hir/def_id.rs). Uniquely identifies a DefPath.
Double pointer a pointer with additional metadata. See "fat pointer" for more.
DST Dynamically-Sized Type. A type for which the compiler cannot statically know the size in memory (e.g. str or [u8]). Such types don't implement Sized and cannot be allocated on the stack. They can only occur as the last field in a struct. They can only be used behind a pointer (e.g. &str or &[u8]).
empty type see "uninhabited type".
Fat pointer a two word value carrying the address of some value, along with some further information necessary to put the value to use. Rust includes two kinds of "fat pointers": references to slices, and trait objects. A reference to a slice carries the starting address of the slice and its length. A trait object carries a value's address and a pointer to the trait's implementation appropriate to that value. "Fat pointers" are also known as "wide pointers", and "double pointers".
free variable a "free variable" is one that is not bound within an expression or term; see the background chapter for more
'gcx the lifetime of the global arena (see more)
generics the set of generic type parameters defined on a type or item
HIR the High-level IR, created by lowering and desugaring the AST (see more)
HirId identifies a particular node in the HIR by combining a def-id with an "intra-definition offset".
HIR Map The HIR map, accessible via tcx.hir, allows you to quickly navigate the HIR and convert between various forms of identifiers.
ICE internal compiler error. When the compiler crashes.
ICH incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
inference variable when doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type.
infcx the inference context (see librustc/infer)
IR Intermediate Representation. A general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it.
local crate the crate currently being compiled.
LTO Link-Time Optimizations. A set of optimizations offered by LLVM that occur just before the final binary is linked. These include optimizations like removing functions that are never used in the final program, for example. ThinLTO is a variant of LTO that aims to be a bit more scalable and efficient, but possibly sacrifices some optimizations. You may also read issues in the Rust repo about "FatLTO", which is the loving nickname given to non-Thin LTO. LLVM documentation: here and here
LLVM (actually not an acronym :P) an open-source compiler backend. It accepts LLVM IR and outputs native binaries. Various languages (e.g. Rust) can then implement a compiler front-end that output LLVM IR and use LLVM to compile to all the platforms LLVM supports.
MIR the Mid-level IR that is created after type-checking for use by borrowck and codegen (see more)
miri an interpreter for MIR used for constant evaluation (see more)
normalize a general term for converting to a more canonical form, but in the case of rustc typically refers to associated type normalization
newtype a "newtype" is a wrapper around some other type (e.g., struct Foo(T) is a "newtype" for T). This is commonly used in Rust to give a stronger type for indices.
NLL non-lexical lifetimes, an extension to Rust's borrowing system to make it be based on the control-flow graph.
node-id or NodeId an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with HirId.
obligation something that must be proven by the trait system (see more)
projection a general term for a "relative path", e.g. x.f is a "field projection", and T::Item is an "associated type projection"
promoted constants constants extracted from a function and lifted to static scope; see this section for more details.
provider the function that executes a query (see more)
quantified in math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see the background chapter for more
query perhaps some sub-computation during compilation (see more)
region another term for "lifetime" often used in the literature and in the borrow checker.
rib a data structure in the name resolver that keeps track of a single scope for names. (see more)
sess the compiler session, which stores global data used throughout compilation
side tables because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
sigil like a keyword but composed entirely of non-alphanumeric tokens. For example, & is a sigil for references.
placeholder NOTE: skolemization is deprecated by placeholder a way of handling subtyping around "for-all" types (e.g., for<'a> fn(&'a u32)) as well as solving higher-ranked trait bounds (e.g., for<'a> T: Trait<'a>). See the chapter on placeholder and universes for more details.
soundness soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
span a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
substs the substitutions for a given generic type or item (e.g. the i32, u32 in HashMap<i32, u32>)
tcx the "typing context", main data structure of the compiler (see more)
'tcx the lifetime of the currently active inference context (see more)
trait reference the name of a trait along with a suitable set of input type/lifetimes (see more)
token the smallest unit of parsing. Tokens are produced after lexing (see more).
TLS Thread-Local Storage. Variables may be defined so that each thread has its own copy (rather than all threads sharing the variable). This has some interactions with LLVM. Not all platforms support TLS.
trans the code to translate MIR into LLVM IR. Renamed to codegen.
trait reference a trait and values for its type parameters (see more).
ty the internal representation of a type (see more).
UFCS Universal Function Call Syntax. An unambiguous syntax for calling a method (see more).
uninhabited type a type which has no values. This is not the same as a ZST, which has exactly 1 value. An example of an uninhabited type is enum Foo {}, which has no variants, and so, can never be created. The compiler can treat code that deals with uninhabited types as dead code, since there is no such value to be manipulated. ! (the never type) is an uninhabited type. Uninhabited types are also called "empty types".
upvar a variable captured by a closure from outside the closure.
variance variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if T is a subtype of U, then Vec<T> is a subtype Vec<U> because Vec is covariant in its generic parameter. See the background chapter for a more general explanation. See the variance chapter for an explanation of how type checking handles variance.
Wide pointer a pointer with additional metadata. See "fat pointer" for more.
ZST Zero-Sized Type. A type whose values have size 0 bytes. Since 2^0 = 1, such types can have exactly one value. For example, () (unit) is a ZST. struct Foo; is also a ZST. The compiler can do some nice optimizations around ZSTs.

Appendix D: Code Index

rustc has a lot of important data structures. This is an attempt to give some guidance on where to learn more about some of the key data structures of the compiler.

Item Kind Short description Chapter Declaration
BodyId struct One of four types of HIR node identifiers Identifiers in the HIR src/librustc/hir/mod.rs
CompileState struct State that is passed to a callback at each compiler pass The Rustc Driver src/librustc_driver/driver.rs
ast::Crate struct A syntax-level representation of a parsed crate The parser src/librustc/hir/mod.rs
hir::Crate struct A more abstract, compiler-friendly form of a crate's AST The Hir src/librustc/hir/mod.rs
DefId struct One of four types of HIR node identifiers Identifiers in the HIR src/librustc/hir/def_id.rs
DiagnosticBuilder struct A struct for building up compiler diagnostics, such as errors or lints Emitting Diagnostics src/librustc_errors/diagnostic_builder.rs
DocContext struct A state container used by rustdoc when crawling through a crate to gather its documentation Rustdoc src/librustdoc/core.rs
HirId struct One of four types of HIR node identifiers Identifiers in the HIR src/librustc/hir/mod.rs
NodeId struct One of four types of HIR node identifiers. Being phased out Identifiers in the HIR src/libsyntax/ast.rs
ParamEnv struct Information about generic parameters or Self, useful for working with associated or generic items Parameter Environment src/librustc/ty/mod.rs
ParseSess struct This struct contains information about a parsing session The parser src/libsyntax/parse/mod.rs
Rib struct Represents a single scope of names Name resolution src/librustc_resolve/lib.rs
Session struct The data associated with a compilation session The parser, The Rustc Driver src/librustc/session/mod.html
SourceFile struct Part of the SourceMap. Maps AST nodes to their source code for a single source file. Was previously called FileMap The parser src/libsyntax_pos/lib.rs
SourceMap struct Maps AST nodes to their source code. It is composed of SourceFiles. Was previously called CodeMap The parser src/libsyntax/source_map.rs
Span struct A location in the user's source code, used for error reporting primarily Emitting Diagnostics src/libsyntax_pos/span_encoding.rs
StringReader struct This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser The parser src/libsyntax/parse/lexer/mod.rs
syntax::token_stream::TokenStream struct An abstract sequence of tokens, organized into TokenTrees The parser, Macro expansion src/libsyntax/tokenstream.rs
TraitDef struct This struct contains a trait's definition with type information The ty modules src/librustc/ty/trait_def.rs
TraitRef struct The combination of a trait and its input types (e.g. P0: Trait<P1...Pn>) Trait Solving: Goals and Clauses, Trait Solving: Lowering impls src/librustc/ty/sty.rs
Ty<'tcx> struct This is the internal representation of a type used for type checking Type checking src/librustc/ty/mod.rs
TyCtxt<'cx, 'tcx, 'tcx> type The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries The ty modules src/librustc/ty/context.rs