- Feature Name:
shared_from_slice
- Start Date: 2017-01-05
- RFC PR: rust-lang/rfcs#1845
- Rust Issue: rust-lang/rust#40475
Summary
This is an RFC to add the APIs: From<&[T]> for Rc<[T]>
where T: Clone
or T: Copy
as well as From<&str> for Rc<str>
. In addition: From<Vec<T>> for Rc<[T]>
and From<Box<T: ?Sized>> for Rc<T>
will be added.
Identical APIs will also be added for Arc
.
Motivation
Caching and string interning
These, and especially the latter - i.e: From<&str>
, trait implementations of From
are useful when dealing with any form of caching of slices.
This especially applies to controllable string interning, where you can cheaply cache strings with a construct such as putting Rc
s into HashSet
s, i.e: HashSet<Rc<str>>
.
An example of string interning:
#![feature(ptr_eq)]
#![feature(shared_from_slice)]
use std::rc::Rc;
use std::collections::HashSet;
use std::mem::drop;
fn cache_str(cache: &mut HashSet<Rc<str>>, input: &str) -> Rc<str> {
// If the input hasn't been cached, do it:
if !cache.contains(input) {
cache.insert(input.into());
}
// Retrieve the cached element.
cache.get(input).unwrap().clone()
}
let first = "hello world!";
let second = "goodbye!";
let mut set = HashSet::new();
// Cache the slices:
let rc_first = cache_str(&mut set, first);
let rc_second = cache_str(&mut set, second);
let rc_third = cache_str(&mut set, second);
// The contents match:
assert_eq!(rc_first.as_ref(), first);
assert_eq!(rc_second.as_ref(), second);
assert_eq!(rc_third.as_ref(), rc_second.as_ref());
// It was cached:
assert_eq!(set.len(), 2);
drop(set);
assert_eq!(Rc::strong_count(&rc_first), 1);
assert_eq!(Rc::strong_count(&rc_second), 2);
assert_eq!(Rc::strong_count(&rc_third), 2);
assert!(Rc::ptr_eq(&rc_second, &rc_third));
One could imagine a scenario where you have an AST with string literals that gets repeated a lot in it. For example, namespaces in XML documents tends to be repeated many times.
The tendril crate does one form of interning:
Buffer sharing is accomplished through thread-local (non-atomic) reference counting
It is useful to provide an implementation of From<&[T]>
as well, and not just for &str
, because one might deal with non-utf8 strings, i.e: &[u8]
. One could potentially reuse this for Path
, OsStr
.
Safe abstraction for unsafe
code.
Providing these implementations in the current state of Rust requires substantial amount of unsafe
code. Therefore, for the sake of confidence in that the implementations are safe - it is best done in the standard library.
RcBox
is not public
Furthermore, since RcBox
is not exposed publicly from std::rc
, one can’t make an implementation outside of the standard library for this without making assumptions about the internal layout of Rc
. The alternative is to roll your own implementation of Rc
in its entirety - but this in turn requires using a lot of feature gates, which makes using this on stable Rust in the near future unfeasible.
For Arc
For Arc
the synchronization overhead of doing .clone()
is probably greater than the overhead of doing Arc<Box<str>>
. But once the clones have been made, Arc<str>
would probably be cheaper to dereference due to locality.
Most of the motivations for Rc
applies to Arc
as well, but the use cases might be fewer. Therefore, the case for adding the same API for Arc
is less clear. One could perhaps use it for multi threaded interning with a type such as: Arc<Mutex<HashSet<Arc<str>>>>
.
Because of the similarities between the layout of Rc
and Arc
, almost identical implementations could be added for From<&[T]> for Arc<[T]>
and From<&str> for Arc<str>
. It would also be consistent to do so.
Taking all of this into account, adding the APIs for Arc
is warranted.
Detailed design
There’s already an implementation
There is already an implementation of sorts alloc::rc
for this. But it is hidden under the feature gate rustc_private
, which, to the authors knowledge, will never be stabilized. The implementation is, on this day, as follows:
impl Rc<str> {
/// Constructs a new `Rc<str>` from a string slice.
#[doc(hidden)]
#[unstable(feature = "rustc_private",
reason = "for internal use in rustc",
issue = "0")]
pub fn __from_str(value: &str) -> Rc<str> {
unsafe {
// Allocate enough space for `RcBox<str>`.
let aligned_len = 2 + (value.len() + size_of::<usize>() - 1) / size_of::<usize>();
let vec = RawVec::<usize>::with_capacity(aligned_len);
let ptr = vec.ptr();
forget(vec);
// Initialize fields of `RcBox<str>`.
*ptr.offset(0) = 1; // strong: Cell::new(1)
*ptr.offset(1) = 1; // weak: Cell::new(1)
ptr::copy_nonoverlapping(value.as_ptr(), ptr.offset(2) as *mut u8, value.len());
// Combine the allocation address and the string length into a fat pointer to `RcBox`.
let rcbox_ptr: *mut RcBox<str> = mem::transmute([ptr as usize, value.len()]);
assert!(aligned_len * size_of::<usize>() == size_of_val(&*rcbox_ptr));
Rc { ptr: Shared::new(rcbox_ptr) }
}
}
}
The idea is to use the bulk of the implementation of that, generalize it to Vec
s and slices, specialize it for &str
, provide documentation for both.
Copy
and Clone
For the implementation of From<&[T]> for Rc<[T]>
, T
must be Copy
if ptr::copy_nonoverlapping
is used because this relies on it being memory safe to simply copy the bits over. If instead, T::clone()
is used in a loop, then T
can simply be Clone
instead. This is however slower than using ptr::copy_nonoverlapping
.
Vec
and Box
For the implementation of From<Vec<T>> for Rc<[T]>
, T
need not be Copy
, nor Clone
. The input vector already owns valid T
s, and these elements are simply copied over bit for bit. After copying all elements, they are no longer
owned in the vector, which is then deallocated. Unfortunately, at this stage, the memory used by the vector can not be reused - this could potentially be changed in the future.
This is similar for Box
.
Suggested implementation
The actual implementations could / will look something like:
For Rc
#[inline(always)]
unsafe fn slice_to_rc<'a, T, U, W, C>(src: &'a [T], cast: C, write_elems: W)
-> Rc<U>
where U: ?Sized,
W: FnOnce(&mut [T], &[T]),
C: FnOnce(*mut RcBox<[T]>) -> *mut RcBox<U> {
// Compute space to allocate for `RcBox<U>`.
let susize = mem::size_of::<usize>();
let aligned_len = 2 + (mem::size_of_val(src) + susize - 1) / susize;
// Allocate enough space for `RcBox<U>`.
let vec = RawVec::<usize>::with_capacity(aligned_len);
let ptr = vec.ptr();
forget(vec);
// Combine the allocation address and the slice length into a
// fat pointer to RcBox<[T]>.
let rbp = slice::from_raw_parts_mut(ptr as *mut T, src.len())
as *mut [T] as *mut RcBox<[T]>;
// Initialize fields of RcBox<[T]>.
(*rbp).strong.set(1);
(*rbp).weak.set(1);
write_elems(&mut (*rbp).value, src);
// Recast to RcBox<U> and yield the Rc:
let rcbox_ptr = cast(rbp);
assert_eq!(aligned_len * susize, mem::size_of_val(&*rcbox_ptr));
Rc { ptr: Shared::new(rcbox_ptr) }
}
#[unstable(feature = "shared_from_slice",
reason = "TODO",
issue = "TODO")]
impl<T> From<Vec<T>> for Rc<[T]> {
/// Constructs a new `Rc<[T]>` from a `Vec<T>`.
/// The allocated space of the `Vec<T>` is not reused,
/// but new space is allocated and the old is deallocated.
/// This happens due to the internal layout of `Rc`.
///
/// # Examples
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// let arr = [1, 2, 3];
/// let vec = vec![Box::new(1), Box::new(2), Box::new(3)];
/// let rc: Rc<[Box<usize>]> = Rc::from(vec);
/// assert_eq!(rc.len(), arr.len());
/// for (x, y) in rc.iter().zip(&arr) {
/// assert_eq!(**x, *y);
/// }
/// ```
#[inline]
fn from(mut vec: Vec<T>) -> Self {
unsafe {
let rc = slice_to_rc(vec.as_slice(), |p| p, |dst, src|
ptr::copy_nonoverlapping(
src.as_ptr(), dst.as_mut_ptr(), src.len())
);
// Prevent vec from trying to drop the elements:
vec.set_len(0);
rc
}
}
}
#[unstable(feature = "shared_from_slice",
reason = "TODO",
issue = "TODO")]
impl<'a, T: Clone> From<&'a [T]> for Rc<[T]> {
/// Constructs a new `Rc<[T]>` by cloning all elements from the shared slice
/// [`&[T]`][slice]. The length of the reference counted slice will be exactly
/// the given [slice].
///
/// # Examples
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// #[derive(PartialEq, Clone, Debug)]
/// struct Wrap(u8);
///
/// let arr = [Wrap(1), Wrap(2), Wrap(3)];
/// let rc: Rc<[Wrap]> = Rc::from(arr.as_ref());
/// assert_eq!(rc.as_ref(), &arr); // The elements match.
/// assert_eq!(rc.len(), arr.len()); // The lengths match.
/// ```
///
/// Using the [`Into`][Into] trait:
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// #[derive(PartialEq, Clone, Debug)]
/// struct Wrap(u8);
///
/// let rc: Rc<[Wrap]> = arr.as_ref().into();
/// assert_eq!(rc.as_ref(), &arr); // The elements match.
/// assert_eq!(rc.len(), arr.len()); // The lengths match.
/// ```
///
/// [Into]: https://doc.rust-lang.org/std/convert/trait.Into.html
/// [slice]: https://doc.rust-lang.org/std/primitive.slice.html
#[inline]
default fn from(slice: &'a [T]) -> Self {
unsafe {
slice_to_rc(slice, |p| p, |dst, src| {
for (d, s) in dst.iter_mut().zip(src) {
ptr::write(d, s.clone())
}
})
}
}
}
#[unstable(feature = "shared_from_slice",
reason = "TODO",
issue = "TODO")]
impl<'a, T: Copy> From<&'a [T]> for Rc<[T]> {
/// Constructs a new `Rc<[T]>` from a shared slice [`&[T]`][slice].
/// All elements in the slice are copied and the length is exactly that of
/// the given [slice]. In this case, `T` must be `Copy`.
///
/// # Examples
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// let arr = [1, 2, 3];
/// let rc = Rc::from(arr);
/// assert_eq!(rc.as_ref(), &arr); // The elements match.
/// assert_eq!(rc.len(), arr.len()); // The length is the same.
/// ```
///
/// Using the [`Into`][Into] trait:
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// let arr = [1, 2, 3];
/// let rc: Rc<[u8]> = arr.as_ref().into();
/// assert_eq!(rc.as_ref(), &arr); // The elements match.
/// assert_eq!(rc.len(), arr.len()); // The length is the same.
/// ```
///
/// [Into]: ../../std/convert/trait.Into.html
/// [slice]: ../../std/primitive.slice.html
#[inline]
fn from(slice: &'a [T]) -> Self {
unsafe {
slice_to_rc(slice, |p| p, <[T]>::copy_from_slice)
}
}
}
#[unstable(feature = "shared_from_slice",
reason = "TODO",
issue = "TODO")]
impl<'a> From<&'a str> for Rc<str> {
/// Constructs a new `Rc<str>` from a [string slice].
/// The underlying bytes are copied from it.
///
/// # Examples
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// let slice = "hello world!";
/// let rc: Rc<str> = Rc::from(slice);
/// assert_eq!(rc.as_ref(), slice); // The elements match.
/// assert_eq!(rc.len(), slice.len()); // The length is the same.
/// ```
///
/// Using the [`Into`][Into] trait:
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// let slice = "hello world!";
/// let rc: Rc<str> = slice.into();
/// assert_eq!(rc.as_ref(), slice); // The elements match.
/// assert_eq!(rc.len(), slice.len()); // The length is the same.
/// ```
///
/// This can be useful in doing [string interning], and caching your strings.
///
/// ```
/// // For Rc::ptr_eq
/// #![feature(ptr_eq)]
///
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
/// use std::collections::HashSet;
/// use std::mem::drop;
///
/// fn cache_str(cache: &mut HashSet<Rc<str>>, input: &str) -> Rc<str> {
/// // If the input hasn't been cached, do it:
/// if !cache.contains(input) {
/// cache.insert(input.into());
/// }
///
/// // Retrieve the cached element.
/// cache.get(input).unwrap().clone()
/// }
///
/// let first = "hello world!";
/// let second = "goodbye!";
/// let mut set = HashSet::new();
///
/// // Cache the slices:
/// let rc_first = cache_str(&mut set, first);
/// let rc_second = cache_str(&mut set, second);
/// let rc_third = cache_str(&mut set, second);
///
/// // The contents match:
/// assert_eq!(rc_first.as_ref(), first);
/// assert_eq!(rc_second.as_ref(), second);
/// assert_eq!(rc_third.as_ref(), rc_second.as_ref());
///
/// // It was cached:
/// assert_eq!(set.len(), 2);
/// drop(set);
/// assert_eq!(Rc::strong_count(&rc_first), 1);
/// assert_eq!(Rc::strong_count(&rc_second), 2);
/// assert_eq!(Rc::strong_count(&rc_third), 2);
/// assert!(Rc::ptr_eq(&rc_second, &rc_third));
///
/// [string interning]: https://en.wikipedia.org/wiki/String_interning
fn from(slice: &'a str) -> Self {
// This is safe since the input was valid utf8 to begin with, and thus
// the invariants hold.
unsafe {
let bytes = slice.as_bytes();
slice_to_rc(bytes, |p| p as *mut RcBox<str>, <[u8]>::copy_from_slice)
}
}
}
#[unstable(feature = "shared_from_slice",
reason = "TODO",
issue = "TODO")]
impl<T: ?Sized> From<Box<T>> for Rc<T> {
/// Constructs a new `Rc<T>` from a `Box<T>` where `T` can be unsized.
/// The allocated space of the `Box<T>` is not reused,
/// but new space is allocated and the old is deallocated.
/// This happens due to the internal layout of `Rc`.
///
/// # Examples
///
/// ```
/// #![feature(shared_from_slice)]
/// use std::rc::Rc;
///
/// let arr = [1, 2, 3];
/// let vec = vec![Box::new(1), Box::new(2), Box::new(3)].into_boxed_slice();
/// let rc: Rc<[Box<usize>]> = Rc::from(vec);
/// assert_eq!(rc.len(), arr.len());
/// for (x, y) in rc.iter().zip(&arr) {
/// assert_eq!(**x, *y);
/// }
/// ```
#[inline]
fn from(boxed: Box<T>) -> Self {
unsafe {
// Compute space to allocate + alignment for `RcBox<T>`.
let sizeb = mem::size_of_val(&*boxed);
let alignb = mem::align_of_val(&*boxed);
let align = cmp::max(alignb, mem::align_of::<usize>());
let size = offset_of_unsafe!(RcBox<T>, value) + sizeb;
// Allocate the space.
let alloc = heap::allocate(size, align);
// Cast to fat pointer: *mut RcBox<T>.
let bptr = Box::into_raw(boxed);
let rcbox_ptr = {
let mut tmp = bptr;
ptr::write(&mut tmp as *mut _ as *mut * mut u8, alloc);
tmp as *mut RcBox<T>
};
// Initialize fields of RcBox<T>.
(*rcbox_ptr).strong.set(1);
(*rcbox_ptr).weak.set(1);
ptr::copy_nonoverlapping(
bptr as *const u8,
(&mut (*rcbox_ptr).value) as *mut T as *mut u8,
sizeb);
// Deallocate box, we've already forgotten it.
heap::deallocate(bptr as *mut u8, sizeb, alignb);
// Yield the Rc:
assert_eq!(size, mem::size_of_val(&*rcbox_ptr));
Rc { ptr: Shared::new(rcbox_ptr) }
}
}
}
These work on zero sized slices and vectors as well.
With more safe abstractions in the future, this can perhaps be rewritten with less unsafe code. But this should not change the API itself and thus will never cause a breaking change.
For Arc
For the sake of brevity, just use the implementation above, and replace:
slice_to_rc
withslice_to_arc
,RcBox
withArcInner
,rcbox_ptr
witharcinner_ptr
,Rc
withArc
.
How We Teach This
The documentation provided in the impls
should be enough.
Drawbacks
The main drawback would be increasing the size of the standard library.
Alternatives
- Only implement this for
T: Copy
and skipT: Clone
. - Let other libraries do this. This has the problems explained in the motivation
section above regarding
RcBox
not being publicly exposed as well as the amount of feature gates needed to roll ones ownRc
alternative - for little gain. - Only implement this for
Rc
and skip it forArc
. - Skip this for
Vec
. - Only implement this for
Vec
. - Skip this for
Box
. - Use
AsRef
. For example:impl<'a> From<&'a str> for Rc<str>
becomesimpl From<AsRef<str>> for Rc<str>
. It could potentially make the API a bit more ergonomic to use. However, it could run afoul of coherence issues, preventing other wanted impls. This RFC currently leans towards not using it. - Add these trait implementations of
From
as functions on&str
like.into_rc_str()
and on&[T]
like.into_rc_slice()
. This RFC currently leans towards usingFrom
implementations for the sake of uniformity and ergonomics. It also has the added benefit of letting you remember one method name instead of many. One could also considerString::into_boxed_str
andVec::into_boxed_slice
, since these are similar with the difference being that this version uses theFrom
trait, and is converted into a shared smart pointer instead. - Also add these APIs as
associated functions
onRc
andArc
as follows:
impl<T: Clone> Rc<[T]> {
fn from_slice(slice: &[T]) -> Self;
}
impl Rc<str> {
fn from_str(slice: &str) -> Self;
}
impl<T: Clone> Arc<[T]> {
fn from_slice(slice: &[T]) -> Self;
}
impl Arc<str> {
fn from_str(slice: &str) -> Self;
}
Unresolved questions
- Should a special version of
make_mut
be added forRc<[T]>
? This could look like:
impl<T> Rc<[T]> where T: Clone {
fn make_mut_slice(this: &mut Rc<[T]>) -> &mut [T]
}