The compiler testing framework
The Rust project runs a wide variety of different tests, orchestrated
by the build system (
x.py test). The main test harness for testing
the compiler itself is a tool called compiletest (sources in the
src/tools/compiletest). This section gives a brief overview of how
the testing framework is setup, and then gets into some of the details
on how to run tests as well as
how to add new tests.
Compiletest test suites
The compiletest tests are located in the tree in the
directory. Immediately within you will see a series of subdirectories
run-make, and so forth). Each of those directories is
called a test suite – they house a group of tests that are run in
a distinct mode.
Here is a brief summary of the test suites as of this writing and what they mean. In some cases, the test suites are linked to parts of the manual that give more details.
ui– tests that check the exact stdout/stderr from compilation and/or running the test
run-pass– tests that are expected to compile and execute successfully (no panics)
run-pass-valgrind– tests that ought to run with valgrind
run-fail– tests that are expected to compile but then panic during execution
compile-fail– tests that are expected to fail compilation.
parse-fail– tests that are expected to fail to parse
pretty– tests targeting the Rust "pretty printer", which generates valid Rust code from the AST
debuginfo– tests that run in gdb or lldb and query the debug info
codegen– tests that compile and then test the generated LLVM code to make sure that the optimizations we want are taking effect.
mir-opt– tests that check parts of the generated MIR to make sure we are building things correctly or doing the optimizations we expect.
incremental– tests for incremental compilation, checking that when certain modifications are performed, we are able to reuse the results from previous compilations.
run-make– tests that basically just execute a
Makefile; the ultimate in flexibility but quite annoying to write.
rustdoc– tests for rustdoc, making sure that the generated files contain the expected documentation.
*-fulldeps– same as above, but indicates that the test depends on things other than
libstd(and hence those things must be built)
The Rust build system handles running tests for various other things, including:
Tidy – This is a custom tool used for validating source code style and formatting conventions, such as rejecting long lines. There is more information in the section on coding conventions.
./x.py test src/tools/tidy
Unit tests – The Rust standard library and many of the Rust packages include typical Rust
#[test]unittests. Under the hood,
cargo teston each package to run all the tests.
./x.py test src/libstd
Doc tests – Example code embedded within Rust documentation is executed via
rustdoc --test. Examples:
./x.py test src/doc– Runs
rustdoc --testfor all documentation in
./x.py test --doc src/libstd– Runs
rustdoc --teston the standard library.
Link checker – A small tool for verifying
hreflinks within documentation.
./x.py test src/tools/linkchecker
Dist check – This verifies that the source distribution tarball created by the build system will unpack, build, and run all tests.
./x.py test distcheck
Tool tests – Packages that are included with Rust have all of their tests run as well (typically by running
cargo testwithin their directory). This includes things such as cargo, clippy, rustfmt, rls, miri, bootstrap (testing the Rust build system itself), etc.
Cargo test – This is a small tool which runs
cargo teston a few significant projects (such as
tokei, etc.) just to ensure there aren't any significant regressions.
./x.py test src/tools/cargotest
When a Pull Request is opened on Github, Travis will automatically launch a
build that will run all tests on a single configuration (x86-64 linux). In
essence, it runs
./x.py test after building.
The integration bot bors is used for coordinating merges to the master branch. When a PR is approved, it goes into a queue where merges are tested one at a time on a wide set of platforms using Travis and Appveyor (currently over 50 different configurations). Most platforms only run the build steps, some run a restricted set of tests, only a subset run the full suite of tests (see Rust's platform tiers).
Testing with Docker images
The Rust tree includes Docker image definitions for the platforms used on Travis in src/ci/docker. The script src/ci/docker/run.sh is used to build the Docker image, run it, build Rust within the image, and run the tests.
TODO: What is a typical workflow for testing/debugging on a platform that you don't have easy access to? Do people build Docker images and enter them to test things out?
Testing on emulators
Some platforms are tested via an emulator for architectures that aren't
readily available. There is a set of tools for orchestrating running the
tests within the emulator. Platforms such as
arm-unknown-linux-gnueabihf are set up to automatically run the tests under
emulation on Travis. The following will take a look at how a target's tests
are run under emulation.
The Docker image for armhf-gnu includes QEMU to emulate the ARM CPU
architecture. Included in the Rust tree are the tools remote-test-client
and remote-test-server which are programs for sending test programs and
libraries to the emulator, and running the tests within the emulator, and
reading the results. The Docker image is set up to launch
remote-test-server and the build tools use
communicate with the server to coordinate running tests (see
TODO: What are the steps for manually running tests within an emulator?
./src/ci/docker/run.sh armhf-gnuwill do everything, but takes hours to run and doesn't offer much help with interacting within the emulator.
Is there any support for emulating other (non-Android) platforms, such as running on an iOS emulator?
Is there anything else interesting that can be said here about running tests remotely on real hardware?
It's also unclear to me how the wasm or asm.js tests are run.
Crater is a tool for compiling and running tests for every crate on crates.io (and a few on GitHub). It is mainly used for checking for extent of breakage when implementing potentially breaking changes and ensuring lack of breakage by running beta vs stable compiler versions.
When to run Crater
You should request a crater run if your PR makes large changes to the compiler or could cause breakage. If you are unsure, feel free to ask your PR's reviewer.
Requesting Crater Runs
The rust team maintains a few machines that can be used for running crater runs on the changes introduced by a PR. If your PR needs a crater run, leave a comment for the triage team in the PR thread. Please inform the team whether you require a "check-only" crater run, a "build only" crater run, or a "build-and-test" crater run. The difference is primarily in time; the conservative (if you're not sure) option is to go for the build-and-test run. If making changes that will only have an effect at compile-time (e.g., implementing a new trait) then you only need a check run.
Your PR will be enqueued by the triage team and the results will be posted when they are ready. Check runs will take around ~3-4 days, with the other two taking 5-6 days on average.
While crater is really useful, it is also important to be aware of a few caveats:
Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere. Also, companies may not wish to publish their code. Thus, a successful crater run is not a magically green light that there will be no breakage; you still need to be careful.
Crater only runs Linux builds on x86_64. Thus, other architectures and platforms are not tested. Critically, this includes Windows.
Many crates are not tested. This could be for a lot of reasons, including that the crate doesn't compile any more (e.g. used old nightly features), has broken or flaky tests, requires network access, or other reasons.
Before crater can be run,
@bors tryneeds to succeed in building artifacts. This means that if your code doesn't compile, you cannot run crater.
A lot of work is put into improving the performance of the compiler and preventing performance regressions. A "perf run" is used to compare the performance of the compiler in different configurations for a large collection of popular crates. Different configurations include "fresh builds", builds with incremental compilation, etc.
The result of a perf run is a comparison between two versions of the compiler (by their commit hashes).
You should request a perf run if your PR may affect performance, especially if it can affect performance adversely.
The following blog posts may also be of interest:
- brson's classic "How Rust is tested"