The "many modes" pattern
Summary
The "many modes" pattern is being able to take a single function and have it operate in multiple "modes". In the specific case examined here, the chumsky parsing library, GATs were used to make the parsing combinators generic over a mode (produce a result vs do not produce a result). This results in significant speedups, because producing a result when you don't need one is expensive at runtime.
To implement the "many modes" pattern, you often have a type representing each mode:
#![allow(unused)] fn main() { struct ParseMode; struct CheckMode; }
and then with a trait that defines the effect of that type. This trait often has associated types that can, e.g., transform the result of a function executing in that mode. This associate type takes a generic parameter T
representing the return type of the function in question:
#![allow(unused)] fn main() { trait Mode { /// Represents the *actual* output when a function that produces /// `T` is processed in this mode. type Output<T>; } }
Details
This was originally posted as a comment on the issue thread.
The first example I looked at closely was the chumsky parsing library. This is leveraging a pattern that I would call the "many modes" pattern. The idea is that you have some "core function" but you want to execute this function in many different modes. Ideally, you'd like to define the modes independently from the function, and you'd like to be able to add more modes later without having to change the function at all. (If you're familiar with Haskell, monads are an example of this pattern; the monad specifies the "mode" in which some simple sequential function is executed.)
chumsky is a parser combinator library, so the "core function" is a parse function, defined in the Parser
trait. Each Parser
trait impl contains a function that indicates how to parse some particular construct in the grammar. Normally, this parser function builds up a data structure representing the parsed data. But sometimes you don't need the full results of the parse: sometimes you might just like to know if the parse succeeds or fails, without building the parsed version. Thus, the "many modes" pattern: we'd like to be able to define our parser and then execute it against one of two modes, emit or check. The emit mode will build the data structure, but check will just check if the parse succeeds.
In the past, chumsky only had one mode, so they always built the data structure. This could take significant time and memory. Adding the "check" mode let's them skip that, which is a significant performance win. Moreover, the modes are encapsulated within the library traits, and aren't visible to end-users. Nice!
How did chumsky model modes with GATs?
Chumsky added a Mode
trait, encapsulated as part of their internals
module. Instead of directly constructing the results from parsing, the Parser
impls invoke methods on Mode
with closures. This allows the mode to decide which parts of the parsing to execute and which to skip. So, in check mode, the Mode
would decide not to execute the closure that builds the output data structure, for example.
Using this approach, the Parser
trait does indeed have several 'entrypoint' methods, but they are all defaulted and just invoke a common implementation method called go
:
#![allow(unused)] fn main() { pub trait Parser<'a, I: Input + ?Sized, E: Error<I::Token> = (), S: 'a = ()> { type Output; fn parse(&self, input: &'a I) -> Result<Self::Output, E> ... { self.go::<Emit>(...) } fn check(&self, input: &'a I) -> Result<(), E> ... { self.go::<Check>(...) } #[doc(hidden)] fn go<M: Mode>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E> where Self: Sized; } }
Implementations of Parser
just specify the go
method. Note that the impls are, presumably, either contained within chumsky
or generated by chumsky
proc-macros, so the go
method doesn't need to be documented. However, even if go
were documented, the trait bounds certainly look quite reasonable. (The type of inp
is a bit...imposing, admittedly.)
So how is the Mode
trait defined? Just to focus on the GAT, the trait look likes this:
#![allow(unused)] fn main() { pub trait Mode { type Output<T>; ... } }
Here, the T
represents the result type of "some parser parsed in this mode". GATs thus allow us to define a Mode
that is independent from any particular Parser
. There are two impls of Mode
(also internal to chumsky):
Check
, defined likestruct Check; impl Mode for Check { type Output<T> = (); ... }
. In other words, no matter what parser you use,Check
just builds a()
result (success or failure is propagated inepdendently of the mode).Emit
, defined likestruct Emit; impl Mode for Emit { type Output<T> = T; ... }
. InEmit
mode, the output is exactly what the parser generated.
Note that you could, in theory, produce other modes. For example, a Count
mode that not only computes success/failure but counts the number of nodes parsed, or perhaps a mode that computes hashes of the resulting parsed value. Moreover, you could add these modes (and the defaulted methods in Parser
) without breaking any clients.
How could you model this today?
I was trying to think how one might model this problem with traits today. All the options I came up with had significant downsides.
Multiple functions on the trait, or multiple traits. One obvious option would be to use multiple functions in the parse trait, or multiple traits:
#![allow(unused)] fn main() { // Multiple functions trait Parser { fn parse(); fn check(); } // Multiple traits trait Parser: Checker { fn parse(); } trait Checker { fn check(); } }
Both of these approaches mean that defining a new combinator requires writing the same logic twice, once for parse and once for check, but with small variations, which is both annoying and a great opportunity for bugs. It also means that if chumsky ever wanted to define a new mode, they would have to modify every implementation of Parser
(a breaking change, to boot).
Mode with a type parameter. You could try defining a the mode trait with a type parameter, like so...
#![allow(unused)] fn main() { trait ModeFor<T> { type Output; ... } }
The go
function would then look like
#![allow(unused)] fn main() { fn go<M: ModeFor<Self::Output>>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E> where Self: Sized; }
In practice, though, this doesn't really work, for a number of reasons. One of them is that the Mode
trait includes methods like combine
, which take the output of many parsers, not just one, and combine them together. Good luck writing that constraint with ModeFor
. But even ignoring that, lacking HRTB, the signature of go
itself is incomplete. The problem is that, given some impl of Parser
for some parser type MyParser
, MyParser
only knows that M
is a valid mode for its particular output. But maybe MyParser
plans to (internally) use some other parser combinators that produce different kinds of results. Will the mode M
still apply to those? We don't know. We'd have to be able to write a HRTB like for<O> Mode<O>
, which Rust doesn't support yet:
#![allow(unused)] fn main() { fn go<M: for<O> Mode<O>>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E> where Self: Sized; }
But even if Rust did support it, you can see that the Mode<T>
trait doesn't capture the user's intent as closely as the Mode
trait from Chumsky did. The Mode
trait was defined independently from all parsers, which is what we wanted. The Mode<T>
trait is defined relative to some specific parser, and then it falls to the go
function to say "oh, I want this to be a mode for all parsers" using a HRTB.
Using just HRTB (which, against, Rust doesn't have), you could define another trait...
#![allow(unused)] fn main() { trait Mode: for<O> ModeFor<O> {} trait ModeFor<O> {} }
...which would allow us to write M: Mode
on go
against, but it's hard to argue this is simpler than the original GAT variety. This extra ModeFor
trait has a "code smell" to it, it's hard to understand why it is there. Whereas before, you implemented the Mode
trait in just the way you think about it, with a single impl that applies to all parsers...
#![allow(unused)] fn main() { impl Mode for Check { type Output<T> = (); ... } }
...you now write an impl of ModeFor
, where one "instance" of the impl applies to only one parser (which has output type O
). It feels indirect:
#![allow(unused)] fn main() { impl<O> ModeFor<O> for Check { type Output = (); ... } }
How could you model this with RPITIT?
It's also been proposed that we should keep GATs, but only as an implementation detail for things like return position impl Trait in traits (RPITIT) or async functions. This implies that we could model the "many modes" pattern with RPITIT. If you look at the Mode
trait, though, you'll see that this simply doesn't work. Consider the combine
method, which takes the results from two parsers and combines them to form a new result:
#![allow(unused)] fn main() { fn combine<T, U, V, F: FnOnce(T, U) -> V>( x: Self::Output<T>, y: Self::Output<U>, f: F, ) -> Self::Output<V>; }
How could we write this in terms of a function that returns impl Trait
?
Other patterns
In this post, I went through the chumsky pattern in detail. I've not had time to dive quite as deep into other examples, but I've been reading through them and trying to extract out patterns. Here are a few patterns I extracted so far:
Did I miss something?
Maybe you see a way to express the "many modes" pattern (or one of the other patterns I cited) in Rust today that works well? Let me know by commenting on the thread.
(Since posting this, it occurs to me that one could probably use procedural macros to achieve some similar goals, though I think this approach would also have significant downsides.)