Summary

Add a high-level intermediate representation (HIR) to the compiler. This is basically a new (and additional) AST more suited for use by the compiler.

This is purely an implementation detail of the compiler. It has no effect on the language.

Note that adding a HIR does not preclude adding a MIR or LIR in the future.

Motivation

Currently the AST is used by libsyntax for syntactic operations, by the compiler for pretty much everything, and in syntax extensions. I propose splitting the AST into a libsyntax version that is specialised for syntactic operation and will eventually be stabilised for use by syntax extensions and tools, and the HIR which is entirely internal to the compiler.

The benefit of this split is that each AST can be specialised to its task and we can separate the interface to the compiler (the AST) from its implementation (the HIR). Specific changes I see that could happen are more ids and spans in the AST, the AST adhering more closely to the surface syntax, the HIR becoming more abstract (e.g., combining structs and enums), and using resolved names in the HIR (i.e., performing name resolution as part of the AST->HIR lowering).

Not using the AST in the compiler means we can work to stabilise it for syntax extensions and tools: it will become part of the interface to the compiler.

I also envisage all syntactic expansion of language constructs (e.g., for loops, if let) moving to the lowering step from AST to HIR, rather than being AST manipulations. That should make both error messages and tool support better for such constructs. It would be nice to move lifetime elision to the lowering step too, in order to make the HIR as explicit as possible.

Detailed design

Initially, the HIR will be an (almost) identical copy of the AST and the lowering step will simply be a copy operation. Since some constructs (macros, for loops, etc.) are expanded away in libsyntax, these will not be part of the HIR. Tools such as the AST visitor will need to be duplicated.

The compiler will be changed to use the HIR throughout (this should mostly be a matter of change the imports). Incrementally, I expect to move expansion of language constructs to the lowering step. Further in the future, the HIR should get more abstract and compact, and the AST should get closer to the surface syntax.

Drawbacks

Potentially slower compilations and higher memory use. However, this should be offset in the long run by making improvements to the compiler easier by having a more appropriate data structure.

Alternatives

Leave things as they are.

Skip the HIR and lower straight to a MIR later in compilation. This has advantages which adding a HIR does not have, however, it is a far more complex refactoring and also misses some benefits of the HIR, notably being able to stabilise the AST for tools and syntax extensions without locking in the compiler.

Unresolved questions

How to deal with spans and source code. We could keep the AST around and reference back to it from the HIR. Or we could copy span information to the HIR (I plan on doing this initially). Possibly some other solution like keeping the span info in a side table (note that we need less span info in the compiler than we do in libsyntax, which is in turn less than tools want).