design meeting 2019-10-11

introduction

Dear @T-compiler/meeting,

Today we will be having a design meeting. The topic was originally sketched as “some Zoxc PR”. We’ve since narrowed that down to discuss #62038, which is a refactoring to how dep-graph loading occurs. @Zoxc wrote up a comment giving a summary of the ideas. Note that this PR itself is an incremental step towards #60035, which aims to make dep-graph loading/saving more continuous.

I’d also like to discuss briefly how we should document these changes. We currently have some rustc-chapters on incremental compilation (e.g., this chapter goes into detail). I would like to move us to a world where major refactorings like #60035 (but not limited to this one – I think e.g. my recent PR and work on lazy norm fits the bill) come along with a rustc-dev-guide chapter that documents the new state of the world. Maybe we discuss some how that might work and – in the case of THIS PR – who might do that documentation work (I don’t necessarily think it has to be @Zoxc, though they’re also an obvious candidate). (In my ideal world, drafts of that chapter would be available before the PR, but at minimum I think such a chapter should be in place to help with reviewing.)

Questions for discussion

“This is where the performance gain of this PR is.”, have we measured this at all? – nikomatsakis
- Here are some measurements. However, are they done against a single-threaded or parallel compiler? -mw
- The results show some improvements and some regressions.
“The one possible performance drawback is that ids can become fragmented since this PR requires us to reuse ids from the previous session.” – in what way is this a performance drawback? Do we have any mechanism to reuse ids or ‘reset’ state after a suitable amount of time?
There is another possible performance drawback: The current dep-graph has to be built two times in the worst case (i.e. full cache, but all invalid). Is that correct?

Minutes and notes from discussion

Review of how existing dep-nodes work
Questions about the PR
- Key idea: change is to load up the old graph and use it as the starting point, editing it in place, rather than copying things out from it into the new graph
- Deleting nodes and garbage
  - when we load nodes, they have no color
  - if at the end of compilation they still have no color, we can delete them
  - we keep a “free list” for indices so they can get re-used
- However:
  - we never shrink the graph as a whole, so if there used to be a lot of ids, we will keep the graph the same size
    - in particular there could be an exceptional execution that creates a lot of nodes
  - ids may also become fragmented over time
- How could we address this?
  - some form of “compression” step when writing back to disk
  - complication: query-result-cache uses ids as keys
    - “if we want to garbage collect them, it must be done in sync with the query result cache”
Conclusions
- We had a good discussion of the PR in question. I don’t think we raised any red flags or anything, the approach seems solid.
- The real question is whether we want to move in he overall direction proposed by #60035
- The goal here is to reduce the cost of loading/saving the dep-graph
- #60035 proposed to do so by incrementally dumping out changed nodes and not retaining them in memory
- This we barely touched on – some open questions for me
  - Are there alternative designs we have in mind?
  - Is #60035 itself an “end state” or a stepping stone?
- Conclusion:
  - Follow-up meeting to dig into design of #60035 and maybe discuss alternatives
  - In parallel, Niko will review #62038 now that he understands roughly what it is trying to do
    - We would also want to write docs for rustc-dev-guide
    - We can figure out how that happens in parallel