Dyn upcast safety
This document proposes a resolution to the last outstanding question blocking dyn
upcast.
Tracking issue: https://github.com/rust-lang/rust/issues/101336
Background
We are trying to enable "upcasts" from a dyn Trait
to its supertraits:
#![allow(unused)] fn main() { trait Foo { fn foo_method(&self); } trait Bar: Foo { fn bar_method(&self); } let x: &dyn Bar = /* ... */; let y: &dyn Foo = x; // compiles }
The key factor for these upcasts is that they require adjusting the vtable. The current implementation strategy is that the vtables for the Bar
trait embed pointers to a Foo
vtable within them:
+----------------------------+
| Bar vtable for some type T |
|----------------------------|
| Foo vtable | ----------> +------------------+
| foo_method | -----+ | Foo vtable for T |
| bar_method | --+ | |------------------|
+----------------------------+ | | | foo_method | ---+
| | +------------------+ |
| | |
| +---> <T as Foo>::foo_method <-+
v
<T as Bar>::bar_method
(this diagram is only meant to convey the general idea of the vtable
layout, and doesn't represent the exact offsets we would use etc;
in fact, with the current implementation,
the first supertrait is stored 'inline' and hence no load is required)
This way, given a &dyn Bar
object, we convert its Bar
vtable to the appropriate Foo
vtable by loading the appropriate field.
Although we don't want to commit to a particular implementation strategy, we do want to leave room for this strategy. One implication is that performing an upcast may require loading from the vtable, which implies that the vtable must be a valid pointer to an actual Rust vtable. Although &dyn Bar
references would always contain a valid vtable, the same is not necessarily true for a raw pointer like *const dyn Bar
or *mut dyn Bar
.
In the language today, we only support "noop upcasts" that don't affect the vtable, and these are safe (e.g., converting from *const dyn Foo + Send
to *const dyn Foo
). If we extend the set of upcasts to permit vtable-adjusting upcasts, like *const dyn Bar
to *const dyn Foo
, this implies that, for safe code at least, all *const dyn Trait
values must have a valid vtable, so that we know we can safely load the required field and perform the upcast.
On the other hand, we do not generally require raw *mut T
pointers to point to valid data. In fact, we explicitly permit them to have any value, including null, and only require that they point to valid data when they are dereferenced. Because dereferencing a raw pointer is an unsafe operation, it has always been considered safe to expose an arbitrary raw pointer to unsafe code -- the unsafety arises when you take a raw pointer from an unknown source and dereference it, since unless you can trace the origin of that pointer you can't possible guarantee that it is valid to dereference.
This brings us to the conflict:
- It has historically been safe to "release" a raw pointer to safe code, but not safe to receive one (since you cannot know if it is valid).
- It has historically been safe to upcast
*const dyn
values (e.g.,*const dyn Foo + Send
to*const dyn Foo
).- Unlike the upcasts we are considering now, this upcast does not require changing the vtable at runtime, but the distinction is subtle for end-users.
- Moreover, there are future extensions (e.g., upcasting
*const dyn Foo + Bar
to*const dyn Foo
) that would require adjusting the vtable much like the upcasts currently being stabilized.
Related future consideration: virtual method calls on raw pointers
There have been requests to extend traits with the option to include raw pointer methods:
#![allow(unused)] fn main() { trait PtrLike { fn is_null(v: *const Self) -> bool; } }
These methods would be useful when writing unsafe code because having an &self
method requires satisfying the validity conditions of an &
-reference, which may not be possible. If we did have such methods, however, it raises the question of whether it would be safe to invoke is_null
on a *const dyn PtrLike
reference. Just as with upcasting, invoking a method from the vtable requires loading from the vtable, and hence requires a valid vtable generated by the compiler.
The solution we propose in this document also resolves this future dilemma.
Definitions: validity vs safety invariant
We adopt the terms validity and safety invariant from the unsafe code guidelines:
The validity invariant is an invariant that all data must uphold any time it is accessed or copied in a typed manner. This invariant is known to the compiler and exploited by optimizations such as improved enum layout or eliding in-bounds checks.
The safety invariant is an invariant that safe code may assume all data to uphold. This invariant is used to justify which operations safe code can perform. The safety invariant can be temporarily violated by unsafe code, but must always be upheld when interfacing with unknown safe code. It is not relevant when arguing whether some program has UB, but it is relevant when arguing whether some code safely encapsulates its unsafety -- in other words, it is relevant when arguing whether some library is sound.
In short, the validity invariant defines a condition that must always be true, even in unsafe code, and the safety invariant defines an invariant that unsafe code must guarantee before a value can be released to be used by arbitrary code.
Contours of the solution space
We can fix this conflict in one of two basic ways:
First, we could make vtable-adjusting upcasts casts (and *Self
method calls) unsafe. This is difficult to implement and would require changes to the Coerce
trait, which is already excessively complicated. In exchange, it offers at best marginal benefit: raw *dyn
pointers can be released to safe code, but safe code can't do anything interesting with them. For this reason, we do not recommend this option.
If vtable-adjusting casts (and *Self
method calls) are safe, then the safety invariant for *dyn
types must be that their metadata points to a fully valid vtable (i.e., a vtable created by the compiler). This ensures safe code can perform upcasts or dynamic dispatch. This also implies that std::ptr::null
(which is safe) cannot be extended to T
where T: ?Sized
unless further changes are made, since we would need to provide a valid vtable (it would be possible to permit a sentinel value, like null, to be used, but that would imply that upcasting must have a branch, making it less efficient).
There are, however, various options for the validity invariant, ranging from no invariant to requiring a fully valid vtable at all times. The strict invariants offer some benefits, such as the ability to have a niche for *dyn
pointers. We survey the options here:
Validity invariant for *dyn metadata | Supports niche | Can be initialized with std::mem::zeroed | Constant initializer |
---|---|---|---|
None | ✅ | ✅ | |
Word-aligned | ✅ | ✅ | ✅ |
Word-aligned, non-null | ✅ | ✅ | |
Valid vtable | ✅ |
Explanations for the column titles:
- Validity invariant for
*dyn
metadata -- describes the invariant that applies to the metadata for a*dyn
value. - Supports niche -- true if there is a niche value so that
sizeof(Option<*const dyn Foo>) == sizeof(*const dyn Foo)
. - Can be initialized with
std::mem::zeroed
-- true ifstd::mem::zeroed
can be used to create a valid value (very convenient). This makes it trivial to innitialize a*const dyn
with a dummy value, though the value cannot be released to safe code. - Constant initializer -- true if there is some constant value for
*const dyn Foo
that satisfies the validity invariant, no matter the traitFoo
. This makes it easy to initialize a*const dyn
with a dummy value, though the value cannot be released to safe code.
Other points:
*dyn
values currently have a niche.- Other pointer-like values (such as
fn
and&
-references) are expected to have a validity invariant of word-aligned, non-null. - Whichever option we use, we can backwards-compatibly move "up the table" and adopt a less-strict validity invariant without introducing UB into any extant programs.
Proposal
The proposal is as follows:
- Vtable-adjusting upcasts are safe operations. The upcast is UB if performed on a value without a valid vtable
- As such, the "safety invariant" requires a fully valid vtable.
- The "validity invariant" requires
*dyn
metadata to be word-aligned and non-null.
Vtable-adjusting upcasts are defined as:
- Trait upcasts that alter the set of methods that can be invoked on the resulting value at runtime (e.g.,
dyn Bar
todyn Foo
from the introduction). In particular, upcasts that simply add or remove auto-traits are not vtable-adjusting (e.g.,dyn Debug + Send
todyn Debug
).
This approach...
- permits safe upcasting (and method calls, in the future);
- preserves the existing niche for
*const dyn
; - is consistent with other validity requirements for "pointer-like" things such as
fn
;
The rules also imply that...
- valid (i.e., compiler-generated) vtables are only required for a
*dyn
pointer when- the
*dyn
pointer is upcast (or invoke methods); - or, when the
*dyn
pointer is released to arbitrary code, because that code may upcast (or invoke methods).- By implication, extending
std::ptr::null
to permitT: ?Sized
would not be safe.
- By implication, extending
- the
Possible future changes...
- without introducing UB, we could loosen the validity invariant if desired, for example to permit NULL as a valid value in unsafe code.
- without introducing UB, we could also loosen the safety invariant to permit a sentinel value (such as NULL), but that would require a branch or other check in the upcast code, which would be less efficient.
Prior links
- Dyn safety write-up, which includes links to prior write-ups
- Exhaustive set of validity invariants considered