`unsafe_pinned`

Feature Name: unsafe_pinned
Start Date: 2022-11-05
RFC PR: rust-lang/rfcs#3467
Tracking Issue: rust-lang/rust#125735

Summary

Add a type UnsafePinned that indicates to the compiler that this field is “pinned” and there might be pointers elsewhere that point to the same memory. This means, in particular, that &mut UnsafePinned is not necessarily a unique pointer, and thus the compiler cannot just make aliasing assumptions. However, &mut UnsafePinned can still be mem::swaped, so this is not a free ticket for arbitrary aliasing of mutable references. You need to use mechanisms such as Pin to ensure that mutable references cannot be used in incorrect ways by clients.

This type is then used in generator lowering, finally fixing #63818.

Motivation

Let’s say you want to write a type with a self-referential pointer:

#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};

pub struct S {
    data: i32,
    ptr_to_data: *mut i32,
}

impl !Unpin for S {}

impl S {
    pub fn new() -> Self {
        S { data: 42, ptr_to_data: ptr::null_mut() }
    }

    pub fn get_data(self: Pin<&mut Self>) -> i32 {
        // SAFETY: We're not moving anything.
        let this = unsafe { Pin::get_unchecked_mut(self) };
        if this.ptr_to_data.is_null() {
            this.ptr_to_data = ptr::addr_of_mut!(this.data);
        }
        // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
        unsafe { this.ptr_to_data.read() }
    }

    pub fn set_data(self: Pin<&mut Self>, data: i32) {
        // SAFETY: We're not moving anything.
        let this = unsafe { Pin::get_unchecked_mut(self) };
        if this.ptr_to_data.is_null() {
            this.ptr_to_data = ptr::addr_of_mut!(this.data);
        }
        // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
        unsafe { this.ptr_to_data.write(data) }
    }
}

fn main() {
    let mut s = pin!(S::new());
    s.as_mut().set_data(42);
    println!("{}", s.as_mut().get_data());
}

This kind of code is implicitly generated by rustc all the time when an async fn has a local variable of reference type that is live across a yield point. The problem is that this code has UB under our aliasing rules: the &mut S inside the self argument of get_data aliases with ptr_to_data! (If you run this code in Miri, remove the impl !Unpin to see the UB. Miri treats Unpin as magic as otherwise the entire async ecosystem would cause errors. But that is not how Unpin was actually designed.)

This simple code only has UB under Stacked Borrows but not under the LLVM aliasing rules; more complex variants of this – still in the realm of what async fn generates – also have UB under the LLVM aliasing rules.

A more complex variant

The following roughly corresponds to a generator with this code:

let mut data = 0;
let ptr_to_data = &mut data;
yield;
*ptr_to_data = 42;
println!("{}", data);
return;

When implemented by hand, it looks as follows, and causes aliasing issues:

#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};
use std::task::Poll;

pub struct S {
    state: i32,
    data: i32,
    ptr_to_data: *mut i32,
}

impl !Unpin for S {}

impl S {
    pub fn new() -> Self {
        S { state: 0, data: 0, ptr_to_data: ptr::null_mut() }
    }

    fn poll(self: Pin<&mut Self>) -> Poll<()> {
        // SAFETY: We're not moving anything.
        let this = unsafe { Pin::get_unchecked_mut(self) };
        match this.state {
            0 => {
                // The first time, set up the pointer.
                this.ptr_to_data = ptr::addr_of_mut!(this.data);
                // Now yield.
                this.state += 1;
                Poll::Pending
            }
            1 => {
                // After coming back from the yield, write to the pointer.
                unsafe { this.ptr_to_data.write(42) };
                // And read our local variable `data`.
                // THIS IS UB! `this` is derived from the `noalias` pointer
                // `self` but we did a write to `this.data` in the previous
                // line when writing to `ptr_to_data`. The compiler is allowed
                // to reorder this and the previous line and then the output
                // would change.
                println!("{}", this.data);
                // Now yield and be done.
                this.state += 1;
                Poll::Ready(())
            }
            _ => unreachable!(),
        }
    }
}

fn main() {
    let mut s = pin!(S::new());
    while let Poll::Pending = s.as_mut().poll() {}
}

Beyond self-referential types, a similar problem also comes up with intrusive linked lists: the nodes of such a list often live on the stack frames of the functions participating in the list, but also have incoming pointers from other list elements. When a function takes a mutable reference to its stack-allocated node, that will alias the pointers from the neighboring elements. This is an example of an intrusive list in the standard library that is breaking Rust’s aliasing rules. Pin is sometimes used to ensure that the list elements don’t just move elsewhere (which would invalidate those incoming pointers) and provide a safe API, but there still is the problem that an &mut Node is actually not a unique pointer due to these aliases – so we need a way for the to opt-out of the aliasing rules.

The goal of this RFC is to offer a way of writing such self-referential types and intrusive collections without UB. We don’t want to change the rules for mutable references in general (that would also affect all the code that doesn’t do anything self-referential), instad we want to be able to tell the compiler that this code is doing funky aliasing and that should be taken into account for optimizations.

Guide-level explanation

To write this code in a UB-free way, wrap the fields that are targets of self-referential pointers in an UnsafePinned:

pub struct S {
    data: UnsafePinned<i32>, // <!---- here
    ptr_to_data: *mut i32,
}

impl S {
    pub fn new() -> Self {
        S { data: UnsafePinned::new(42), ptr_to_data: ptr::null_mut() }
    }

    pub fn get_data(self: Pin<&mut Self>) -> i32 {
        // SAFETY: We're not moving anything.
        let this = unsafe { Pin::get_unchecked_mut(self) };
        if this.ptr_to_data.is_null() {
            this.ptr_to_data = UnsafePinned::raw_get_mut(ptr::addr_of_mut!(this.data));
        }
        // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
        unsafe { this.ptr_to_data.read() }
    }

    pub fn set_data(self: Pin<&mut Self>, data: i32) {
        // SAFETY: We're not moving anything.
        let this = unsafe { Pin::get_unchecked_mut(self) };
        if this.ptr_to_data.is_null() {
            this.ptr_to_data = UnsafePinned::raw_get_mut(ptr::addr_of_mut!(this.data));
        }
        // SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
        unsafe { this.ptr_to_data.write(data) }
    }
}

This lets the compiler know that mutable references to data might still have aliases, and so optimizations cannot assume that no aliases exist. That’s entirely analogous to how UnsafeCell lets the compiler know that shared references to this field might undergo mutation.

However, what is not analogous is that &mut S, when handed to safe code you do not control, must still be unique pointers! This is because of methods like mem::swap that can still assume that their two arguments are non-overlapping. (mem::swap on two &mut UnsafePinned<i32> may soundly assume that they do not alias.) In other words, the safety invariant of &mut S still requires full ownership of the entire memory range S is stored at; for the duration that a function holds on to the borrow, nobody else may read and write this memory. But again, this is a safety invariant; it only applies to safe code you do not control. You can write your own code handling &mut S and as long as that code is careful not to make use of this memory in the wrong way, potential aliasing is fine.

To hand such references to safe code, use Pin: the type Pin<&mut S> can be safely given to external code, since the Pin wrapper blocks access to operations like mem::swap.

Similarly, the intrusive linked list from the motivation can be fixed by wrapping the entire UnsafeListEntry in UnsafePinned.

Reference-level explanation

API sketch:

/// The type `UnsafePinned<T>` lets unsafe code violate
/// the rule that `&mut UnsafePinned<T>` may never alias anything else.
///
/// However, even if you define your type like `pub struct Wrapper(UnsafePinned<...>)`,
/// it is still very risky to have an `&mut Wrapper` that aliases
/// anything else. Many functions that work generically on `&mut T` assume that the
/// memory that stores `T` is uniquely owned (such as `mem::swap`). In other words,
/// while having aliasing with `&mut Wrapper` is not immediate Undefined
/// Behavior, it is still unsound to expose such a mutable reference to code you do
/// not control! Techniques such as pinning via `Pin` are needed to ensure soundness.
///
/// Similar to `UnsafeCell`, `UnsafePinned` will not usually show up in the public
/// API of a library. It is an internal implementation detail of libraries that
/// need to support aliasing mutable references.
///
/// Further note that this does *not* lift the requirement that shared references
/// must be read-only! Use `UnsafeCell` for that.
///
/// This type blocks niches the same way `UnsafeCell` does.
#[lang = "unsafe_aliased"]
#[repr(transparent)]
struct UnsafePinned<T: ?Sized> {
    value: T,
}

/// When this type is used, that almost certainly means safe APIs need to use pinning
/// to avoid the aliases from becoming invalidated. Therefore let's mark this as `!Unpin`.
impl<T> !Unpin for UnsafePinned<T> {}

/// The type is `Copy` when `T` is to avoid people assuming that `Copy` implies there
/// is no `UnsafePinned` anywhere. (This is an issue with `UnsafeCell`: people use `Copy` bounds
/// to mean `Freeze`.) Given that there is no `unsafe impl Copy for ...`, this is also
/// the option that leaves the user more choices (as they can always wrap this in a `!Copy` type).
impl<T: Copy> Copy for UnsafePinned<T> {}
impl<T: Copy> Clone for UnsafePinned<T> {
    fn clone(&self) -> Self { *self }
}

// `Send` and `Sync` are inherited from `T`. This is similar to `SyncUnsafeCell`, since
// we eventually concluded that `UnsafeCell` implicitly making things `!Sync` is sometimes
// unergonomic. A type that needs to be `!Send`/`!Sync` should really have an explicit
// opt-out itself, e.g. via an `PhantomData<*mut T>` or (one day) via `impl !Send`/`impl !Sync`.

impl<T: ?Sized> UnsafePinned<T> {
    /// Constructs a new instance of `UnsafePinned` which will wrap the specified
    /// value.
    pub fn new(value: T) -> UnsafePinned<T> where T: Sized {
        UnsafePinned { value }
    }

    pub fn into_inner(self) -> T where T: Sized {
        self.value
    }

    /// Get read-write access to the contents of an `UnsafePinned`.
    ///
    /// You should usually be using `get_mut_pinned` instead to explicitly track
    /// the fact that this memory is "pinned" due to there being aliases.
    pub fn get_mut_unchecked(&mut self) -> *mut T {
        ptr::addr_of_mut!(self.value)
    }

    /// Get read-write access to the contents of a pinned `UnsafePinned`.
    pub fn get_mut_pinned(self: Pin<&mut Self>) -> *mut T {
        // SAFETY: we're not using `get_unchecked_mut` to unpin anything
        unsafe { ptr::addr_of_mut!(self.get_unchecked_mut().value) }
    }

    /// Get read-only access to the contents of a shared `UnsafePinned`.
    /// Note that `&UnsafePinned<T>` is read-only if `&T` is read-only.
    /// This means that if there is mutation of the `T`, future reads from the
    /// `*const T` returned here are UB!
    ///
    /// ```rust
    /// unsafe {
    ///     let mut x = UnsafePinned::new(0);
    ///     let ref1 = &mut *addr_of_mut!(x);
    ///     let ref2 = &mut *addr_of_mut!(x);
    ///     let ptr = ref1.get(); // read-only pointer, assumes immutability
    ///     ref2.get_mut().write(1);
    ///     ptr.read(); // UB!
    /// }
    /// ```
    pub fn get(&self) -> *const T {
        self as *const _ as *const T
    }

    pub fn raw_get_mut(this: *mut Self) -> *mut T {
        this as *mut T
    }

    pub fn raw_get(this: *const Self) -> *const T {
        this as *const T
    }
}

The comment about aliasing &mut being “risky” refers to the fact that their safety invariant still asserts exclusive ownership. This implies that duplicate in the following example, while not causing immediate UB, is still unsound:

pub struct S {
    data: UnsafePinned<i32>,
}

impl S {
    fn new(x: i32) -> Self {
        S { data: UnsafePinned::new(x) }
    }

    fn duplicate<'a>(s: &'a mut S) -> (&'a mut S, &'a mut S) {
        let s1 = unsafe { (&s).transmute_copy() };
        let s2 = s;
        (s1, s2)
    }
}

The unsoundness is easily demonstrated by using safe code to cause UB:

let mut s = S::new(42);
let (s1, s2) = s.duplicate(); // no UB
mem::swap(s1, s2); // UB

We could even soundly make get_mut_unchecked return an &mut T, given that the safety invariant is not affected by UnsafePinned. But that would probably not be useful and only cause confusion.

Here is a polyfill on current Rust that uses the Unpin hack to achieve mostly the same effect as this API. (“Mostly” because a safe impl Unpin for ... can un-do the effect of this, which would not be the case with the real UnsafePinned.)

Reference diff:

  * Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped
-   [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`].
+   [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`] or
+   the `&mut T` contains an [`UnsafePinned<U>`].

Async generator lowering changes:

Fields that represent local variables whose address is taken across a yield point must be wrapped in UnsafePinned.

rustc and Miri changes:

We have a UnsafeUnpin auto trait similar to Freeze that is implemented if the type does not contain any by-val UnsafePinned. This trait is an internal implementation detail and (for now) not exposed to users.
noalias on mutable references is only emitted for UnsafeUnpin types. (This replaces the current hack where it is only emitted for Unpin types.)
Niches are blocked on UnsafePinned.
Miri will do SharedReadWrite retagging inside UnsafePinned similar to what it does inside UnsafeCell already. (This replaces the current Unpin-based hack.)

Comparison with some other types that affect aliasing

UnsafeCell: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. &UnsafeCell<T> is special. UnsafeCell<&T> (by-val, fully owned) is not special at all and basically like &T; &mut UnsafeCell<T> is also not special. Safe wrappers around this type can expose mutability behind shared references, such as &RefCell<T>.
UnsafePinned: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. &mut UnsafePinned<T> is special. UnsafePinned<&mut T> (by-val, fully owned) is not special at all and basically like &mut T; &UnsafePinned<T> is also not special. Safe wrappers around this type can expose sharing that involves pinned mutable references, such as Pin<&mut MyFuture>.
MaybeDangling: disables aliasing and dereferencable of all references (and boxes) directly inside it, i.e. MaybeDangling<&[mut] T> is special. &[mut] MaybeDangling<T> is not special at all and basically like &[mut] T.

Drawbacks

It’s yet another wrapper type adjusting our aliasing rules and very easy to mix up with UnsafeCell or MaybeDangling. Furthermore, it is an extremely subtle wrapper type, as the duplicate example shows.
UnsafeUnpin is a somewhat unfortunate twin to Unpin. The purpose of UnsafeUnpin really is only to search for UnsafePinned fields, so that we can use the trait solver to determine whether an &mut reference gets noalias or not. The actual safety promise of UnsafeUnpin is likely going to be exactly the same as Unpin, but we can’t use a stable and safe trait to determine noalias: impl UnsafeUnpin for T would add the noalias back to &mut T, and that can lead to very surprising aliasing issues as the poll_fn debacle showed. (Note that PollFn has already been fixed, but that doesn’t mean nobody will make similar mistakes in the future so it is worth discussing how the original, problematic PollFn would fare under this RFC.) Splitting up the traits partially mitigates such issues: after impl<T> Unpin for PollFn<T>, PollFn<T> is (in general) Unpin + !UnsafeUnpin. The known examples of UB that Miri found all were caused by bad aliasing assumptions, which no longer occur when the aliasing assumptions are tied to UnsafeUnpin rather than Unpin. Actually moving the PollFn would still cause problems (and that can be done in safe code since it implements Unpin), but now the chances of code causing UB are much reduced since one must both pin data that’s moved into a closure, and move that closure – even though the Rust compiler will not help prevent such moves, programmers thinking carefully about pinning are hopefully less likely to then try to move that closure. In conclusion, Unpin + !UnsafeUnpin types are somewhat foot-gunny but less foot-gunny than the status quo. Maybe in a future edition Unpin can be transitioned to an unsafe trait and then this situation can be re-evaluated; for now, UnsafeUnpin remains an unstable implementation detail similar to Freeze. (UnsafeUnpin + !Unpin types are harmless, one just loses the ability to call Pin::deref_mut for no good reason.)
It is unfortunate that &mut UnsafePinned and &mut TypeThatContainsUnsafePinned lose their no-alias assumptions even when they are not currently pinned. However, since pinning was implemented as a library concept, there’s not really any way the compiler can know whether an &mut UnsafePinned is pinned or not – working with Pin<&mut TypeThatContainsUnsafePinned> generally requires using Pin::get_unchecked_mut and Pin::map_unchecked_mut, which exposes &mut TypeThatContainsUnsafePinned and &mut UnsafePinned that still need to be considered aliased.

Rationale and alternatives

The proposal in this RFC matches what was discussed with the lang team a long time ago.

However, of course one could imagine alternatives:

Keep the status quo. The current sitaution is that we only make aliasing requirements on mutable references if the type they point to is Unpin. This is unsatisfying: Unpin was never meant to have this job. A consequence is that a stray impl Unpin on a Wrapper<T>-style type can lead to subtle miscompilations since it re-adds aliasing requirements for the inner T. Contrast this with the UnsafeCell situation, where it is not possible for (stable) code to just impl Freeze for T in the wrong way – UnsafeCell is always recognized by the compiler.

On the other hand, UnsafePinned is rather quirky in its behavior and having two marker traits (UnsafeUnpin and Unpin) might be too confusing, so sticking with Unpin might not be too bad in comparison.

If we do that, however, it seems preferrable to transition Unpin to an unsafe trait. There is a clear statement about the types’ invariants associated with Unpin, so an impl Unpin already comes with a proof obligation. It just happens to be the case that in a module without unsafe, one can always arrange all the pieces such that the proof obligation is satisifed. This is mostly a coincidence and related to the fact that we don’t have safe field projections on Pin. That said, solving this also requires solving the trouble around Drop and Pin, where effectively an impl Drop does an implicit get_mut_unchecked, i.e., implicitly assumes the type is Unpin.
UnsafePinned could affect aliasing guarantees both on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing &mut UnsafePinned<T> is cast or coerced to &UnsafePinned<T>: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more ‘powerful’ than UnsafeCell in the sense that replacing UnsafeCell by UnsafePinned would always be correct. (Under the RFC as written, UnsafeCell and UnsafePinned can be nested to remove aliasing requirements from both shared and mutable references.)

If we don’t do this, we could consider removing get since it seems too much like a foot-gun. But that makes shared references to UnsafePinned fairly pointless. Shared references to generators/futures are basically useless so it is unclear what the potential use-cases here are.
Instead of introducing a new type, we could say that UnsafeCell affects both shared and mutable references. That would lose some optimization potential on types like &mut Cell<T>, but would avoid the footgun of coercing an &mut UnsafePinned<T> to &UnsafePinned<T>. That said, so far the author is not aware of Miri detecting code that would run into this footgun (and Miri is able to detect such issues).
We could entirely avoid all these problems by not having aliasing restrictions on mutable references. But that is completely against the direction Rust has had for 8 years now, and it would mean removing LLVM noalias annotations for mutable references (and likely boxes) entirely. That is sacrificing optimization potential for the common case in favor of simplifying niche cases such as self-referential structs – which is against the usual design philosophy of Rust.
Instead of adding a new type that needs to be used as Pin<&mut UnsafePinned<T>>, can’t we just make Pin<&mut T> special? The answer is no, because working with Pin<&mut T> in unsafe code usually involves getting direct access to the &mut and then using it “carefully”. But being careful is not enough when the compiler makes non-aliasing assumptions! We need to preserve the fact that the &mut T may have aliases even after Pin::get_unchecked_mut was used and inside Pin::map_unchecked_mut. In a different universe where pinning is a first-class concept with native support for projections and no need for get_unchecked_mut, this might not have been required, but with pinning being introduced as a library type, there is no (currently known) alternative to UnsafePinned.

In terms of rationale, the question that comes to mind first is why is this so different from UnsafeCell. UnsafeCell opts-out of read-only guarantees for shared references, can’t we just have a type that opts-out of uniqueness guarantees for mutable references? The answer is no, because mutable references have some universal operations that exploit their uniqueness – in particular, mem::swap. In contrast, there exists no operation available on all shared references that exploits their immutability. This is why we need pinning to make APIs around UnsafePinned actually sound.

Naming

An earlier proposal suggested to call the type UnsafeAliased, since the type is not inherently tied to pinning. However, it is not possible to write sound wrappers around UnsafeAliased such that we can have aliasing &mut MyType. One has to use pinning for that: Pin<&mut MyType>. Because of that, the RFC proposes that we suggestively put pinning into the name of the type, so that people don’t confuse it with a general mechanism for aliasing mutable references. It is more like the core primitive behind pinning, where whenever a type is pinned that is caused by an UnsafePinned field somewhere inside it.

For instance, it may be tempting to use an UnsafeAliased type to mark a single field in some struct as “separately aliased”, and then a Mutex<Struct> would acquire ownership of the entire struct except for that field. However, due to mem::swap, that would not be sound. One cannot hand out an &mut to such aliased memory as part of a safe-to-use abstraction – except by using pinning.

Prior art

This is somewhat like UnsafeCell, but for mutable instead of shared references.

Adding something like this to Rust has been discussed many times throughout the years. Here are some links for recent discussions:

Unresolved questions

Is there a better name than UnsafePinned for the type? What about the trait – UnsafeUnpin is awkward. Maybe Unique is better?
How do we transition code that relies on Unpin opting-out of aliasing guarantees to the new type? Here’s a plan: define the PhantomPinned type as UnsafePinned<()>. Places in the standard library that use impl !Unpin for and the generator lowering are adjusted to use UnsafePinned instead. Then as long as nobody outside the standard library used the unstable impl !Unpin for, switching the noalias-opt-out to the UnsafeUnpin trait is actually backwards compatible with the (never explicitly supported) Unpin hack! However, if we ever make UnsafePinned location-relevant (i.e., only data inside the UnsafePinned is allowed to have aliases), then unsafe code in the ecosystem needs to be adjusted. The justification for this is that currently this code relies on undocumented unstable behavior (using !Unpin to opt-out of aliasing guarantees), so we are in our right to declare it unsound. Of course we should give the ecosystem time to migrate to the new approach before we actually start doing optimizations that exploit this UB.
Relatedly, in which module should this type live?
Unpin also affects the dereferenceable attribute, so the same would happen for this type. Is that something we want to guarantee, or do we hope to get back dereferenceable when better semantics for it materialize on the LLVM side?

Future possibilities

Similar to how we might want the ability to project from &UnsafeCell<Struct> to &UnsafeCell<Field>, the same kind of projection could also be interesting for UnsafePinned.

The Rust RFC Book