Barbara wants to use GhostCell-like cell borrowing with futures
🚧 Warning: Draft status 🚧
This is a draft "status quo" story submitted as part of the brainstorming period. It is derived from real-life experiences of actual Rust users and is meant to reflect some of the challenges that Async Rust programmers face today.
If you would like to expand on this story, or adjust the answers to the FAQ, feel free to open a PR making edits (but keep in mind that, as they reflect peoples' experiences, status quo stories cannot be wrong, only inaccurate). Alternatively, you may wish to add your own status quo story!
The story
Barbara quite likes using statically-checked cell borrowing. "Cell"
in Rust terminology refers to types like Cell
or RefCell
that
enable interior mutability, i.e. modifying or mutably borrowing stuff
even if you've only got an immutable reference to it.
Statically-checked cell borrowing is a technique whereby one object
(an "owner") acts as a gatekeeper for borrow-access to a set of other
objects ("cells"). So if you have mutable borrow access to the owner,
you can temporarily transfer that mutable borrow access to a cell in
order to modify it. This is all checked at compile-time, hence
"statically-checked".
In comparison RefCell
does borrow-checking, but it is checked at
runtime and it will panic if you make a coding mistake. The advantage
of statically-checked borrowing is that it cannot panic at runtime,
i.e. all your borrowing bugs show up at compile time. The history
goes way back, and the technique has been reinvented at least 2-3
times as far as Barbara is aware. This is implemented in various
forms in GhostCell and
qcell
.
Barbara would like to use statically-checked cell borrowing within
futures, but there is no way to get the owner borrow through the
Future::poll
call, i.e. there is no argument or object that the
runtime could save the borrow in. Mostly this does not cause a
problem, because there are other ways for a runtime to share data,
e.g. data can be incorporated into the future when it is created.
However in this specific case, for the specific technique of
statically-checked cell borrows, we need an active borrow to the owner
to be passed down the call stack through all the poll calls.
So Barbara is forced to use RefCell
instead and be very careful not
to cause panics. This seems like a step back. It feels dangerous to
use RefCell
and to have to manually verify that her cell borrows are
panic-free.
There are good habits that you can adopt to offset the dangers, of
course. If you are very careful to make sure that you call no other
method or function which might in turn call code which might attempt
to get another borrow on the same cell, then the RefCell::borrow_mut
panics can be avoided. However this is easy to overlook, and it is
easy to fail to anticipate what indirect calls will be made by a given
call, and of course this may change later on due to maintenance and
new features. A borrow may stay active longer than expected, so calls
which appear safe might actually panic. Sometimes it's necessary to
manually drop the borrow to be sure. In addition you'll never know
what indirect calls might be made until all the possible code-paths
have been explored, either through testing or through running in
production.
So Barbara prefers to avoid all these problems, and use statically-checked cell borrowing where possible.
Example 1: Accessing an object shared outside the runtime
In this minimized example of code to interface a stream to code
outside of the async/await system, the buffer has to be accessible
from both the stream and the outside code, so it is handled as a
Rc<RefCell<StreamBuffer<T>>>
.
#![allow(unused)] fn main() { pub struct StreamPipe<T> { buf: Rc<RefCell<StreamBuffer<T>>>, req_more: Rc<dyn Fn()>, } impl<T> Stream for StreamPipe<T> { type Item = T; fn poll_next(self: Pin<&mut Self>, _: &mut Context<'_>) -> Poll<Option<T>> { let mut buf = self.buf.borrow_mut(); if let Some(item) = buf.value.take() { return Poll::Ready(Some(item)); } if buf.end { return Poll::Ready(None); } (self.req_more)(); // Callback to request more data Poll::Pending } } }
Probably req_more()
has to schedule some background operation, but
if it doesn't and attempts to modify the shared buf
immediately then
we get a panic, because buf
is still borrowed. The real life code
could be a lot more complicated, and the required combination of
conditions might be harder to hit in testing.
With statically-checked borrowing, the borrow would be something like
let mut buf = self.buf.rw(cx);
, and the req_more
call would either
have to take the cx
as an argument (forcing the previous borrow to
end) or would not take cx
, meaning that it would always have to
defer the access to the buffer to other code, because without the cx
there is no possible way to access the buffer.
Example 2: Shared monitoring data
In this example, the app keeps tallies of various things in a
Monitor
structure. This might be data in/out, number of errors
detected, maybe a hashmap of current links, etc. Since it is accessed
from various components, it is kept behind an Rc<RefCell<_>>
.
// Dependency: futures-lite = "1.11.3" use std::cell::RefCell; use std::rc::Rc; fn main() { let monitor0 = Rc::new(RefCell::new(Monitor { count: 0 })); let monitor1 = monitor0.clone(); let fut0 = async move { let mut borrow = monitor0.borrow_mut(); borrow.count += 1; }; let fut1 = async move { let mut borrow = monitor1.borrow_mut(); borrow.count += 1; fut0.await; }; futures_lite::future::block_on(fut1); } struct Monitor { count: usize, }
The problem is that this panics with a borrowing error because the
borrow is still active when the fut0.await
executes and attempts
another borrow. The solution is to remember to drop the borrow before
awaiting.
In this example code the bug is obvious, but in real life maybe fut0
only borrows in rare situations, e.g. when an error is detected. Or
maybe the future that borrows is several calls away down the
callstack.
With statically-checked borrowing, there is a slight problem in that
currently there is no way to access the poll context from async {}
code. But if there was then the borrow would be something like let mut borrow = monitor1.rw(cx);
, and since the fut0.await
implicitly
requires the cx
in order to poll, the borrow would be forced to end
at that point.
Further investigation by Barbara
The mechanism
Barbara understands that statically-checked cell borrows work by
having an owner held by the runtime, and various instances of a cell
held by things running on top of the runtime (these cells would
typically be behind Rc
references). A mutable borrow on the owner
is passed down the stack, which enables safe borrows on all the cells,
since a mutable borrow on a cell is enabled by temporarily holding
onto the mutable borrow of the owner, which is all checked at
compile-time.
So the mutable owner borrow needs to be passed through the poll
call, and Barbara realizes that this would require support from the
standard library.
Right now a &mut Context<'_>
is passed to poll
, and so within
Context
would be the ideal place to hold a borrow on the cell owner.
However as far as Barbara can see there are difficulties with all the
current implementations:
-
GhostCell (or qcell::LCell) may be the best available solution, because it doesn't have any restrictions on how many runtimes might be running or how they might be nested. But Rust insists that the lifetimes
<'id>
on methods and types are explicit, so it seems like that would force a change to the signature ofpoll
, which would break the ecosystem.Here Barbara experiments with a working example of a modified Future trait and a future implementation that makes use of LCell:
// Requires dependency: qcell = "0.4" use qcell::{LCell, LCellOwner}; use std::pin::Pin; use std::rc::Rc; use std::task::Poll; struct Context<'id, 'a> { cell_owner: &'a mut LCellOwner<'id>, } struct AsyncCell<'id, T>(LCell<'id, T>); impl<'id, T> AsyncCell<'id, T> { pub fn new(value: T) -> Self { Self(LCell::new(value)) } pub fn rw<'a, 'b: 'a>(&'a self, cx: &'a mut Context<'id, 'b>) -> &'a mut T { cx.cell_owner.rw(&self.0) } } trait Future<'id> { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context<'id, '_>) -> Poll<Self::Output>; } struct MyFuture<'id> { count: Rc<AsyncCell<'id, usize>>, } impl<'id> Future<'id> for MyFuture<'id> { type Output = (); fn poll(self: Pin<&mut Self>, cx: &mut Context<'id, '_>) -> Poll<Self::Output> { *self.count.rw(cx) += 1; Poll::Ready(()) } } fn main() { LCellOwner::scope(|mut owner| { let mut cx = Context { cell_owner: &mut owner }; let count = Rc::new(AsyncCell::new(0_usize)); let mut fut = Box::pin(MyFuture { count: count.clone() }); let _ = fut.as_mut().poll(&mut cx); assert_eq!(1, *count.rw(&mut cx)); }); }
-
The other
qcell
types (QCell, TCell and TLCell) have various restrictions or overheads which might make them unsuitable as a general-purpose solution in the standard library. However they do have the positive feature of not requiring any change in the signature ofpoll
. It looks like they could be added toContext
without breaking anything.Here Barbara tries using
TLCell
, and finds that the signature ofpoll
doesn't need to change:
// Requires dependency: qcell = "0.4" use qcell::{TLCell, TLCellOwner}; use std::pin::Pin; use std::rc::Rc; use std::task::Poll; struct AsyncMarker; struct Context<'a> { cell_owner: &'a mut TLCellOwner<AsyncMarker>, } struct AsyncCell<T>(TLCell<AsyncMarker, T>); impl<T> AsyncCell<T> { pub fn new(value: T) -> Self { Self(TLCell::new(value)) } pub fn rw<'a, 'b: 'a>(&'a self, cx: &'a mut Context<'b>) -> &'a mut T { cx.cell_owner.rw(&self.0) } } trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>; } struct MyFuture { count: Rc<AsyncCell<usize>>, } impl Future for MyFuture { type Output = (); fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { *self.count.rw(cx) += 1; Poll::Ready(()) } } fn main() { let mut owner = TLCellOwner::new(); let mut cx = Context { cell_owner: &mut owner }; let count = Rc::new(AsyncCell::new(0_usize)); let mut fut = Box::pin(MyFuture { count: count.clone() }); let _ = fut.as_mut().poll(&mut cx); assert_eq!(1, *count.rw(&mut cx)); }
(For comparison, TCell
only allows one owner per marker type in
the whole process. QCell
allows many owners, but requires a
runtime check to make sure you're using the right owner to access a
cell. TLCell
allows only one owner per thread per marker type,
but also lets cells migrate between threads and be borrowed locally,
which the others don't -- see qcell
docs.)
So the choice is GhostCell/LCell and lifetimes everywhere, or various other cell types that may be too restrictive.
Right now Barbara thinks that none of these solutions is likely to be acceptable for the standard library. However still it is a desirable feature, so maybe someone can think of a way around the problems. Or maybe someone has a different perspective on what would be acceptable.
Proof of concept
The Stakker runtime makes use of qcell-based statically-checked cell borrowing. It uses this to get zero-cost access to actors, guaranteeing at compile time that no actor can access any other actor's state. It also uses it to allow inter-actor shared state to be accessed safely and zero-cost, without RefCell.
(For example within a Stakker actor, you can access the contents of a
Share<T>
via the actor context cx
as follows: share.rw(cx)
,
which blocks borrowing or accessing cx
until that borrow on share
has been released. Share<T>
is effectively a Rc<ShareCell<T>
and
cx
has access to an active borrow on the ShareCellOwner
, just as
in the long examples above.)
Stakker doesn't use GhostCell (LCell) because of the need for <'id>
annotations on methods and types. Instead it uses the other three
cell types according to how many Stakker instances will be run, either
one Stakker instance only, one per thread, or multiple per thread.
This is selected by cargo features.
Switching implementations like this doesn't seem like an option for the standard library.
Way forward
Barbara wonders whether there is any way this can be made to work.
For example, could the compiler derive all those <'id>
annotations
automatically for GhostCell/LCell?
Or for multi-threaded runtimes, would
qcell::TLCell
be acceptable?
This allows a single cell-owner in every thread. So it would not
allow nested runtimes of the same type. However it does allow borrows
to happen at the same time independently in different threads, and it
also allows the migration of cells between threads, which is safe
because that kind of cell isn't Sync
.
Or is there some other form of cell-borrowing that could be devised that would work better for this?
The interface between cells and Context
should be straightforward
once a particular cell type is demonstrated to be workable with the
poll
interface and futures ecosystem. For example copying the API
style of Stakker:
let rc = Rc::new(AsyncCell::new(1_u32));
*rc.rw(cx) = 2;
So logically you obtain read-write access to a cell by naming the
authority by which you claim access, in this case the poll context.
In this case it really is naming rather than accessing since the
checks are done at compile time and the address that cx
represents
doesn't actually get passed anywhere or evaluated, once inlining and
optimisation is complete.
🤔 Frequently Asked Questions
What are the morals of the story?
The main problem is that Barbara has got used to a safer environment and it feels dangerous to go back to RefCell and have to manually verify that her cell borrows are panic-free.
What are the sources for this story?
The author of Stakker is trying to interface it to async/await and futures.
Why did you choose Barbara to tell this story?
Barbara has enough Rust knowledge to understand the benefits that GhostCell/qcell-like borrowing might bring.
How would this story have played out differently for the other characters?
The other characters perhaps wouldn't have heard of statically-checked cell borrows so would be unaware of the possibility of making things safer.