T-compiler Meeting Agenda 2022-06-02

Announcements

Tomorrow <time:2022-06-03T10:00:00-04:00 Types Team: Planning/Deep-Dive calendar link
Reminder: if you see a PR/issue that seems like there might be legal implications due to copyright/IP/etc, please let the Core team know (or at least message @pnkfelix or @Wesley Wiser so we can pass it along).
Stable MIR MVP: https://github.com/rust-lang/rust/pull/97385
- basically just a single entry point for all external users of MIR
- expected to evolve to wrap common APIs with less-frequently changing ones.
- no path towards actual stabilization yet

Other WG meetings (calendar link)

time:2022-06-06T16:00:00+02:00 wg-debugging status & design meeting
time:2022-06-06T17:00:00+02:00 wg-rls-2.0 steering meeting
time:2022-06-06T17:30:00+02:00 Async WG triage meeting
time:2022-06-06T22:00:00+02:00 [wg-traits] GATs Sync
time:2022-06-15T15:00:00+02:00 [Types team] Hack session: Advanced subtyping

MCPs/FCPs

New MCPs (take a look, see if you like them!)
- No new proposals this time.
Old MCPs (not seconded, take a look)
- “Tier 3 target proposal: riscv64gc-linux-android (Android target for riscv64gc)” compiler-team#472 (last review activity: 5 months ago)
  - :loudspeaker: Stale MCP: candidate for closing
  - Zulip discussion
  - MCP author replied to all questions, so it seems no visible concern to be addressed
- “-Dwarnings to cover all warnings” compiler-team#473 (last review activity: 5 months ago)
- “Dealing with type/const ambiguities” compiler-team#480 (last review activity: 4 months ago)
- “Removing codegen logic for nvptx-nvidia-cuda (32-bit target)” compiler-team#496 (last review activity: 2 months ago)
- “Arbitrary annotations in compiletest” compiler-team#513 (last review activity: about 24 days ago)
Pending FCP requests (check your boxes!)
- “Stabilize -Zgcc-ld=lld as -Clink-self-contained=linker -Clinker-flavor=gcc-lld” compiler-team#510
- “Increase the minimum linux-gnu versions” rust#95026
  - Note: T-libs discussed and approved this change ([comment])(https://github.com/rust-lang/rust/pull/95026#issuecomment-1144222893)
Things in FCP (make sure you’re good with it)
- “Build-time execution sandboxing” compiler-team#475
- “Change compiletest declarations parsing” compiler-team#512
Accepted MCPs
- No new accepted proposals this time.
Finalized FCPs (disposition merge)
- “Tracking Issue for -Z terminal-width” rust#84673
- “Remove migrate borrowck mode” rust#95565
- “Stabilize the bundle native library modifier” rust#95818
- “Modify MIR building to drop repeat expressions with length zero” rust#95953
- “Remove label/lifetime shadowing warnings” rust#96296

WG checkins

@_WG-traits (impl trait) by @nikomatsakis @oli (prev. checkin)
@_WG-llvm by @nagisa and @Nikita Popov (prev. checkin)

Last bits for opaque pointer support in rust have landed (https://github.com/rust-lang/rust/pull/94214) and I’ve enabled them in LLVM today, so we should be able to pick this up with the LLVM 15 upgrade without further changes.

There’s a proposal for first-class aborting unwind support (https://discourse.llvm.org/t/rfc-add-call-unwindabort-to-llvm-ir/62543), that will likely be useful for rust as well.

There’s a PR for virtual function elimination with LTO (https://github.com/rust-lang/rust/pull/96285). Though looks like it’s not entirely sound at the moment.

Backport nominations

T-compiler beta / T-compiler stable

:beta: “don’t do Sized and other return type checks on RPIT’s real type” rust#97431
- nominated by @oli
- fixes rust#97226, ICE when doing Sized check against the RPIT’s real type
:beta: “Fix indices and remove some unwraps in arg mismatch algorithm” rust#97557
- patch authored and nominated for backport by @Michael Goulet (compiler-errors)
- Fixes rust#97484 (ICE when parsing args)
- not yet merged (cc @Jack Huey for review)

T-rustdoc beta / T-rustdoc stable

No backport nominations for T-rustdoc this time.

:back: / :shrug: / :hand:

PRs S-waiting-on-team

T-compiler

No PRs waiting on T-compiler this time.

Oldest PRs waiting for review

T-compiler

“Add option to pass environment variables” rust#94387 (last review activity: about 33 days ago)
- cc latest reviewers: @Esteban Küber and @bjorn3
“Clone suggestions” rust#95115 (last review activity: about 26 days ago)
- assigned reviewer @Esteban Küber which suggested an additional reviewer (comment)
- also cc: @oli (as they left comments, too)
“Add Finalize statement to make deaggregation “reversible” by storing all information in MIR” rust#96043 (last review activity: about 25 days ago)
- maybe a comment on perf run results? maybe cc: @Wesley Wiser
“Remove opaque types from typeck expectations” rust#96727 (last review activity: about 25 days ago)
- assigned for review to @lcnr
“Print type of every call in a method call chain” rust#96918 (last review activity: about 20 days ago)
- cc: @Michael Goulet (compiler-errors) maybe for comment on latest author comment?

Issues of Note

Short Summary

P-critical

T-compiler

“Infinite recursion optimized away with opt-level=z” rust#97428
- @Nikita Popov already provided patch LLVM upstream and backport for our LLVM fork (comment)
“Wrong cast of u16 to usize on aarch64” rust#97463
- very old unsoundness issue (comment) on tier 1 platform (tier support list), surfaced just now

T-rustdoc

No P-critical issues for T-rustdoc this time.

P-high regressions

P-high beta regressions

No P-high beta regressions this time.

Unassigned P-high nightly regressions

No unassigned P-high nightly regressions this time.

Performance logs

triage logs for 2022-05-31

A good week: The regressions were small; some have follow-up PR’s in flight to address them; and we saw a big improvement from PR #97345, which adds more fast paths for quickly exiting comparisons between two types (such as BitsImpl<M> and BitsImpl<N> for const integers M and N). This improved compile-times for the bitmaps benchmark by 50-65% in some cases (including the trunk nalgebra, according to independent investigation from nnethercote). That same PR had more modest improvements (1% to 2%) to the compile-times for a number of other crates. Many thanks to lcnr and nnethercote for some excellent work here!

Triage done by @pnkfelix. Revision range: 43d9f385..0a43923a

Summary:

	mean	max	count
Regressions (primary)	N/A	N/A	0
Regressions (secondary)	0.8%	0.9%	3
Improvements (primary)	-4.0%	-65.9%	227
Improvements (secondary)	-2.0%	-7.7%	217
All (primary)	-4.0%	-65.9%	227

3 Regressions, 1 Improvements, 9 Mixed; 0 of them in rollups 59 artifact comparisons made in total 24 Untriaged Pull Requests

Regressions

Proc macro tweaks #97004 (Comparison Link)

	mean	max	count
Regressions (primary)	0.2%	0.2%	1
Regressions (secondary)	0.6%	1.5%	10
Improvements (primary)	N/A	N/A	0
Improvements (secondary)	N/A	N/A	0
All (primary)	0.2%	0.2%	1

aparently making a Buffer<T> non-generic (its solely instantiated at u8) caused a performance regression.
there is ongoing follow-up work to address the regression, either by making the buffer generic again, or by adding #[inline] annotations.
see e.g. PR 97539

rustdoc: include impl generics / self in search index #96652 (Comparison Link)

	mean	max	count
Regressions (primary)	0.5%	1.0%	3
Regressions (secondary)	1.1%	1.1%	3
Improvements (primary)	N/A	N/A	0
Improvements (secondary)	N/A	N/A	0
All (primary)	0.5%	1.0%	3

This regressed doc-generation for a few primary benchmarks, but I think that might be a inherent cost of a change like this. I marked it as triaged based on that assumption.

Move things to rustc_type_ir #97287 (Comparison Link)

	mean	max	count
Regressions (primary)	0.5%	0.7%	9
Regressions (secondary)	1.1%	1.1%	3
Improvements (primary)	N/A	N/A	0
Improvements (secondary)	N/A	N/A	0
All (primary)	0.5%	0.7%	9

During development, this PR was identified as regressing one primary benchmark, bitmaps (in a variety of contexts), and the PR author said better to take that perf hit and deal with it later.
In the PR that landed, the number of regressing variants was smaller, but the affected set of benchmarks slightly larger: both bitmaps and unicode-normalization are affected, regressing by 0.35% to 0.70%.
My very brief inpsection of the flamegraphs (old, new) didn’t show any smoking guns.
At this point I think we can just live with this performance hit.

Improvements

Add suggestion for relaxing static lifetime bounds on dyn trait impls in NLL #97284 (Comparison Link)

	mean	max	count
Regressions (primary)	N/A	N/A	0
Regressions (secondary)	N/A	N/A	0
Improvements (primary)	-0.2%	-0.2%	1
Improvements (secondary)	-5.3%	-5.9%	6
All (primary)	-0.2%	-0.2%	1

This managed to improve performance (slightly), probably because it restructured the CallArgument and that had fallout that ended up being positive overall (despite our intuition being that it would hurt performance due to the increase in size).
It might be nice to confirm that hypothesis independently (by isolating just that structural change and confirming that it has a similar effect on performance here) …
… but, the improvements are essentially isolated to just the secondary wg-grammar benchmark, so its not really worth too much digging, except if we think it might reveal other structural changes we should make elsewhere.

Mixed

add a deep fast_reject routine #97345 (Comparison Link)

	mean	max	count
Regressions (primary)	N/A	N/A	0
Regressions (secondary)	0.9%	1.6%	10
Improvements (primary)	-11.0%	-65.8%	45
Improvements (secondary)	-0.5%	-0.9%	12
All (primary)	-11.0%	-65.8%	45

this was great, as noted in summary.

Move various checks to typeck so them failing causes the typeck result to get tainted #96046 (Comparison Link)

	mean	max	count
Regressions (primary)	N/A	N/A	0
Regressions (secondary)	0.9%	1.1%	4
Improvements (primary)	-0.3%	-0.4%	4
Improvements (secondary)	-0.3%	-0.3%	24
All (primary)	-0.3%	-0.4%	4

This had some small improvements to primary benchmarks.
The CTFE stress test regressed, but I assume that was expected since this was a change to the CTFE engine (to address some ICE’s).

Update jemalloc to v5.3 #96790 (Comparison Link)

	mean	max	count
Regressions (primary)	N/A	N/A	0
Regressions (secondary)	0.3%	0.3%	1
Improvements (primary)	-0.9%	-6.2%	212
Improvements (secondary)	-1.1%	-3.2%	191
All (primary)	-0.9%	-6.2%	212

This had various improvements to our primary benchmarks' instruction counts.
The other big thing to observe: the max-rss was improved in several primary benchmarks (by 1% to 6%), and some secondary benchmarks saw even more significant improvements to their max-rss.

Split dead store elimination off dest prop #97158 (Comparison Link)

	mean	max	count
Regressions (primary)	0.5%	1.3%	15
Regressions (secondary)	0.6%	2.3%	12
Improvements (primary)	-0.4%	-1.9%	50
Improvements (secondary)	-0.6%	-1.3%	33
All (primary)	-0.2%	-1.9%	65

The changes here were investigated by the PR author.
Marking as triaged based on their investigation.

Try to cache region_scope_tree as a query #97383 (Comparison Link)

	mean	max	count
Regressions (primary)	N/A	N/A	0
Regressions (secondary)	1.0%	1.0%	3
Improvements (primary)	-1.7%	-4.7%	98
Improvements (secondary)	-2.4%	-7.7%	43
All (primary)	-1.7%	-4.7%	98

This was a targeted PR to address the regressions introduced by PR #95563 (Comparison Link).
It seems to succeed at recovering the instruction-count performance lost in that PR.
It doesn’t quite recover all of the max-rss lost (PR #97383 improves max-rss by about 3.75% while PR #95563 degraded max-rss by about 4%), but its close enough to satisfy our needs.

proc_macro: don’t pass a client-side function pointer through the server. #97461 (Comparison Link)

	mean	max	count
Regressions (primary)	0.5%	0.5%	2
Regressions (secondary)	N/A	N/A	0
Improvements (primary)	-0.4%	-0.8%	11
Improvements (secondary)	-2.6%	-5.4%	10
All (primary)	-0.3%	-0.8%	13

On the primary benchmarks, this mostly yielded small improvements; the one exception was serde_derive (debug), which regressed by ~0.5% for the full and incr-full variants.
Looking at the flamegraphs for the before and after commits, and at the table at bottom of details page, it seems like the instruction-count regression is in codegen_module
Looking at the history of serde_derive-debug (massively zoomed in on the “Percent Delta from First” Graph kind), it seems reasonable to think that something did happen on this PR.
…. but I also don’t really think its a big enough regression to be worth tearing our hair out over. This is a (smallish) win overall, and even for serde_derive-debug, it is a small regression in the context of much larger wins, so overall the trajectory is good.
Marked as triaged.

Replace #[default_method_body_is_const] with #[const_trait] #96964 (Comparison Link)

	mean	max	count
Regressions (primary)	0.2%	0.3%	9
Regressions (secondary)	0.2%	0.3%	3
Improvements (primary)	N/A	N/A	0
Improvements (secondary)	-1.1%	-1.1%	3
All (primary)	0.2%	0.3%	9

The primary regressions are mostly in variants of stm32f4, with a two serde and one syn thrown in for good measure.
The improvements are solely to ctfe stress test, which I guess makes sense given the PR?
Anyway, the regressions seem minor, and they are contained to an unstable feature that the stdlib is using.
Marking as triaged.

improve format impl for literals #97480 (Comparison Link)

	mean	max	count
Regressions (primary)	0.4%	0.5%	4
Regressions (secondary)	N/A	N/A	0
Improvements (primary)	N/A	N/A	0
Improvements (secondary)	-1.5%	-1.5%	1
All (primary)	0.4%	0.5%	4

This is adding a fast-path so that format!("literal") will compile into the same code as "literal".to_owned().
The primary regression is solely contained to bitmaps.
Its possible that the regression to bitmaps is due to format!("literal") being totally unused in that code; all instances of format! there take an additional argument. So its possible that the extra code to check about whether to use the fast-path is slowing things down there.
But I personally don’t believe that explanation here: Unless I’m misunderstanding the code, there is some amount of macro-expansion into multiple instances of format!, but most of the expanded code is going to be dominated by all the impl blocks, not the relatively few format! instances. (Unless I massively misunderstand how the macros and/or codegen and/or inlining end up linking up here.)
So: I don’t believe the best hypothesis I have for what is happening here.
But I also do not think the regression here is large enough to warrant further investigation.
Marking as triaged.

errors: simplify referring to fluent attributes #97357 (Comparison Link)

	mean	max	count
Regressions (primary)	N/A	N/A	0
Regressions (secondary)	0.4%	0.6%	9
Improvements (primary)	N/A	N/A	0
Improvements (secondary)	-0.4%	-0.5%	4
All (primary)	N/A	N/A	0

There were some regressions here, but they are few, minor, and contained solely to secondary benchmarks (specifically projection-caching and wg-grammar). Marking as triaged.

Nominated Issues

T-compiler

“Update Mac Catalyst support for Clang 13” rust#96392
- nominated by @Esteban Küber in comment to find someone more conversant with that platform
- @__Thom Chiovoloni pinged a list of names that could help (comment)

RFC

No new RFC to be discussed

Next week’s WG checkins

@_WG-traits Traits (generic work of the WG) by @nikomatsakis and @Jack Huey
@_WG-mir-opt MIR Optimizations by @oli