Making compiletest more maintainable: reworking directive handling

Metadata
:-----------------	-------------------------
Point of contact	Jieyou Xu
Teams	bootstrap, compiler, rustdoc
Task owners	Jieyou Xu
Status	Proposed
Zulip channel	N/A
Tracking issue	rust-lang/rust-project-goals#259

Summary

Short description of what you will do over the next 6 months.

Rework [compiletest]'s directive handling to make it more maintainable, have better UX for compiler contributors, and fix some long-standing issues.

rustc relies on the test infrastructure implemented by the test harness [compiletest] (supported by bootstrap) to run the test suites under tests/ (e.g. ui tests, mir-opt tests, run-make tests, etc.). However, [compiletest] is currently very undertested and undermaintained, which is not ideal because we rely on the test suites to check rustc's behavior. The current implementation in [compiletest] is also such that it's very hard and unpleasant to make changes (e.g. adding new directives) to provide up-to-date test infrastructure support for the needs of compiler (and rustdoc) contributors. The UX is not great either because of poor error handling and error reporting.

The status quo

The current status quo is that [compiletest] imposes significant friction for compiler (and rustdoc) contributors who want to run tests and diagnose test failures. [compiletest] error messages are opaque, terse and hard to read. We had to include a separate allow-list of known directives to detect unknown directives. We still sometimes let malformed directives through and silently do nothing. Argument splitting is naive and inconsistent. The implementation is very convoluted. Also there's still insufficient documentation.

See the tracking issue of various directive handling related bugs.

The next 6 months

The key changes I want to achieve:

Directive handling is testable (at all) and in addition have strong test coverage.
Directives have stricter syntax to reduce ambiguity and enable invalid directive detection or make invalid directive detection easier.
Directives are well-documented. Move directive documentation close to directives themselves and make it possible to be generated alongside tool docs for compiletest, so it's less likely to become outdated and to enable documentation coverage enforcement.
- Also, make sure that we have robust self documentation so it's not only one or two contributors who understands how things work inside compiletest...
Generally improve directive handling robustness. Examples: fixing argument splitting in compile-flags, fix paths related to aux-build, etc.
Test writers and reviewers can receive better diagnostics, for things like a directive is not accepted in a given test suite or why something in compiletest failed.

The "shiny future" we are working towards

My long-term goal for [compiletest] is that I want it to make it significantly easier to maintain. Concretely, this means significantly better test coverage, easier to extend, better documentation. Hopefully, by being more maintainable, we are able to attract more active maintainers from both bootstrap and compiler teams and make the code base significantly more pleasant to work on.

For directive handling specifically, it should mean that:

It's relatively straightforward and low friction to implement new directives, including test coverage and documentation. It should be easy to do the right thing.
[compiletest] should produce error messages that are easy to read and understand, possibly even making suggestions.
Directives should be documented (and enforced to be documented) via rustdoc which are made available on nightly-rustc docs so we can back-link from dev-guide and not have to maintain two sets of docs that are mutually inconsistent.

Ownership and team asks

Owner: [Jieyou Xu]

Note that [compiletest] is (in theory) currently co-maintained by both t-bootstrap and t-compiler, but AFAIK is (in practice) currently not really actively maintained by anyone else. The following team asks are probably mostly compiler for feedback on their use cases (as a test infra consumer) and bootstrap for implementation review.

Task	Owner(s) or team(s)	Notes
Discussion and moral support	bootstrap, compiler, rustdoc	including consultations for desired test behaviors and testing infra consumers
Experimental prototype implementation¹	Jieyou Xu	to see how approaches look like and gain experience/feedback
[`compiletest`] changes w/ experience from prototype	Jieyou Xu
Standard reviews	bootstrap, compiler	Probably mostly bootstrap or whoever is more interested in reviewing [`compiletest`] changes
Inside Rust blog post for project outcome	Jieyou Xu

I want to start with an out-of-tree experimental prototype to see how the pieces are fit together to make it easier to rapidly iterate and receive feedback without having to mess with the "live" [compiletest] that does not have sufficient test coverage.

Definitions

Definitions for terms used above:

Discussion and moral support is the lowest level offering, basically committing the team to nothing but good vibes and general support for this endeavor.
Author RFC and Implementation means actually writing the code, document, whatever.
Design meeting means holding a synchronous meeting to review a proposal and provide feedback (no decision expected).
RFC decisions means reviewing an RFC and deciding whether to accept.
Org decisions means reaching a decision on an organizational or policy matter.
Secondary review of an RFC means that the team is "tangentially" involved in the RFC and should be expected to briefly review.
Stabilizations means reviewing a stabilization and report and deciding whether to stabilize.
Standard reviews refers to reviews for PRs against the repository; these PRs are not expected to be unduly large or complicated.
Prioritized nominations refers to prioritized lang-team response to nominated issues, with the expectation that there will be some response from the next weekly triage meeting.
Dedicated review means identifying an individual (or group of individuals) who will review the changes, as they're expected to require significant context.
Other kinds of decisions:
- Lang team experiments are used to add nightly features that do not yet have an RFC. They are limited to trusted contributors and are used to resolve design details such that an RFC can be written.
- Compiler Major Change Proposal (MCP) is used to propose a 'larger than average' change and get feedback from the compiler team.
- Library API Change Proposal (ACP) describes a change to the standard library.

Frequently asked questions

TODO: pending during project discussions

[Jieyou Xu]: https://github.com/jieyouxu [compiletest]: https://github.com/rust-lang/rust/tree/master/src/tools/compiletest

Rust Project Goals