- Feature Name: specialization
- Start Date: 2015-06-17
- RFC PR: rust-lang/rfcs#1210
- Rust Issue: rust-lang/rust#31844
Summary
This RFC proposes a design for specialization, which permits multiple impl
blocks to apply to the same type/trait, so long as one of the blocks is clearly
“more specific” than the other. The more specific impl
block is used in a case
of overlap. The design proposed here also supports refining default trait
implementations based on specifics about the types involved.
Altogether, this relatively small extension to the trait system yields benefits for performance and code reuse, and it lays the groundwork for an “efficient inheritance” scheme that is largely based on the trait system (described in a forthcoming companion RFC).
Motivation
Specialization brings benefits along several different axes:
-
Performance: specialization expands the scope of “zero cost abstraction”, because specialized impls can provide custom high-performance code for particular, concrete cases of an abstraction.
-
Reuse: the design proposed here also supports refining default (but incomplete) implementations of a trait, given details about the types involved.
-
Groundwork: the design lays the groundwork for supporting “efficient inheritance” through the trait system.
The following subsections dive into each of these motivations in more detail.
Performance
The simplest and most longstanding motivation for specialization is performance.
To take a very simple example, suppose we add a trait for overloading the +=
operator:
trait AddAssign<Rhs=Self> {
fn add_assign(&mut self, rhs: Rhs);
}
It’s tempting to provide an impl for any type that you can both Clone
and
Add
:
impl<R, T: Add<R> + Clone> AddAssign<R> for T {
fn add_assign(&mut self, rhs: R) {
let tmp = self.clone() + rhs;
*self = tmp;
}
}
This impl is especially nice because it means that you frequently don’t have to
bound separately by Add
and AddAssign
; often Add
is enough to give you
both operators.
However, in today’s Rust, such an impl would rule out any more specialized
implementation that, for example, avoids the call to clone
. That means there’s
a tension between simple abstractions and code reuse on the one hand, and
performance on the other. Specialization resolves this tension by allowing both
the blanket impl, and more specific ones, to coexist, using the specialized ones
whenever possible (and thereby guaranteeing maximal performance).
More broadly, traits today can provide static dispatch in Rust, but they can
still impose an abstraction tax. For example, consider the Extend
trait:
pub trait Extend<A> {
fn extend<T>(&mut self, iterable: T) where T: IntoIterator<Item=A>;
}
Collections that implement the trait are able to insert data from arbitrary
iterators. Today, that means that the implementation can assume nothing about
the argument iterable
that it’s given except that it can be transformed into
an iterator. That means the code must work by repeatedly calling next
and
inserting elements one at a time.
But in specific cases, like extending a vector with a slice, a much more
efficient implementation is possible – and the optimizer isn’t always capable
of producing it automatically. In such cases, specialization can be used to get
the best of both worlds: retaining the abstraction of extend
while providing
custom code for specific cases.
The design in this RFC relies on multiple, overlapping trait impls, so to take
advantage for Extend
we need to refactor a bit:
pub trait Extend<A, T: IntoIterator<Item=A>> {
fn extend(&mut self, iterable: T);
}
// The generic implementation
impl<A, T> Extend<A, T> for Vec<A> where T: IntoIterator<Item=A> {
// the `default` qualifier allows this method to be specialized below
default fn extend(&mut self, iterable: T) {
... // implementation using push (like today's extend)
}
}
// A specialized implementation for slices
impl<'a, A> Extend<A, &'a [A]> for Vec<A> {
fn extend(&mut self, iterable: &'a [A]) {
... // implementation using ptr::write (like push_all)
}
}
Other kinds of specialization are possible, including using marker traits like:
unsafe trait TrustedSizeHint {}
that can allow the optimization to apply to a broader set of types than slices,
but are still more specific than T: IntoIterator
.
Reuse
Today’s default methods in traits are pretty limited: they can assume only the
where
clauses provided by the trait itself, and there is no way to provide
conditional or refined defaults that rely on more specific type information.
For example, consider a different design for overloading +
and +=
, such that
they are always overloaded together:
trait Add<Rhs=Self> {
type Output;
fn add(self, rhs: Rhs) -> Self::Output;
fn add_assign(&mut self, rhs: Rhs);
}
In this case, there’s no natural way to provide a default implementation of
add_assign
, since we do not want to restrict the Add
trait to Clone
data.
The specialization design in this RFC also allows for default impls, which can provide specialized defaults without actually providing a full trait implementation:
// the `default` qualifier here means (1) not all items are implied
// and (2) those that are can be further specialized
default impl<T: Clone, Rhs> Add<Rhs> for T {
fn add_assign(&mut self, rhs: Rhs) {
let tmp = self.clone() + rhs;
*self = tmp;
}
}
This default impl does not mean that Add
is implemented for all Clone
data, but just that when you do impl Add
and Self: Clone
, you can leave off
add_assign
:
#[derive(Copy, Clone)]
struct Complex {
// ...
}
impl Add<Complex> for Complex {
type Output = Complex;
fn add(self, rhs: Complex) {
// ...
}
// no fn add_assign necessary
}
A particularly nice case of refined defaults comes from trait hierarchies: you
can sometimes use methods from subtraits to improve default supertrait
methods. For example, consider the relationship between size_hint
and
ExactSizeIterator
:
default impl<T> Iterator for T where T: ExactSizeIterator {
fn size_hint(&self) -> (usize, Option<usize>) {
(self.len(), Some(self.len()))
}
}
Supporting efficient inheritance
Finally, specialization can be seen as a form of inheritance, since methods defined within a blanket impl can be overridden in a fine-grained way by a more specialized impl. As we will see, this analogy is a useful guide to the design of specialization. But it is more than that: the specialization design proposed here is specifically tailored to support “efficient inheritance” schemes (like those discussed here) without adding an entirely separate inheritance mechanism.
The key insight supporting this design is that virtual method definitions in languages like C++ and Java actually encompass two distinct mechanisms: virtual dispatch (also known as “late binding”) and implementation inheritance. These two mechanisms can be separated and addressed independently; this RFC encompasses an “implementation inheritance” mechanism distinct from virtual dispatch, and useful in a number of other circumstances. But it can be combined nicely with an orthogonal mechanism for virtual dispatch to give a complete story for the “efficient inheritance” goal that many previous RFCs targeted.
The author is preparing a companion RFC showing how this can be done with a relatively small further extension to the language. But it should be said that the design in this RFC is fully motivated independently of its companion RFC.
Detailed design
There’s a fair amount of material to cover, so we’ll start with a basic overview of the design in intuitive terms, and then look more formally at a specification.
At the simplest level, specialization is about allowing overlap between impl blocks, so long as there is always an unambiguous “winner” for any type falling into the overlap. For example:
impl<T> Debug for T where T: Display {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
<Self as Display>::fmt(self, f)
}
}
impl Debug for String {
fn fmt(&self, f: &mut Formatter) -> Result {
try!(write!(f, "\""));
for c in self.chars().flat_map(|c| c.escape_default()) {
try!(write!(f, "{}", c));
}
write!(f, "\"")
}
}
The idea for this pair of impls is that you can rest assured that any type
implementing Display
will also implement Debug
via a reasonable default, but
go on to provide more specific Debug
implementations when warranted. In
particular, the intuition is that a Self
type of String
is somehow “more
specific” or “more concrete” than T where T: Display
.
The bulk of the detailed design is aimed at making this intuition more precise. But first, we need to explore some problems that arise when you introduce specialization in any form.
Hazard: interactions with type checking
Consider the following, somewhat odd example of overlapping impls:
trait Example {
type Output;
fn generate(self) -> Self::Output;
}
impl<T> Example for T {
type Output = Box<T>;
fn generate(self) -> Box<T> { Box::new(self) }
}
impl Example for bool {
type Output = bool;
fn generate(self) -> bool { self }
}
The key point to pay attention to here is the difference in associated types:
the blanket impl uses Box<T>
, while the impl for bool
just uses bool
.
If we write some code that uses the above impls, we can get into trouble:
fn trouble<T>(t: T) -> Box<T> {
Example::generate(t)
}
fn weaponize() -> bool {
let b: Box<bool> = trouble(true);
*b
}
What’s going on? When type checking trouble
, the compiler has a type T
about
which it knows nothing, and sees an attempt to employ the Example
trait via
Example::generate(t)
. Because of the blanket impl, this use of Example
is
allowed – but furthermore, the associated type found in the blanket impl is now
directly usable, so that <T as Example>::Output
is known within trouble
to
be Box<T>
, allowing trouble
to type check. But during monomorphization,
weaponize
will actually produce a version of the code that returns a boolean
instead, and then attempt to dereference that boolean. In other words, things
look different to the typechecker than they do to codegen. Oops.
So what went wrong? It should be fine for the compiler to assume that T: Example
for all T
, given the blanket impl. But it’s clearly problematic to
also assume that the associated types will be the ones given by that blanket
impl. Thus, the “obvious” solution is just to generate a type error in trouble
by preventing it from assuming <T as Example>::Output
is Box<T>
.
Unfortunately, this solution doesn’t work. For one thing, it would be a breaking change, since the following code does compile today:
trait Example {
type Output;
fn generate(self) -> Self::Output;
}
impl<T> Example for T {
type Output = Box<T>;
fn generate(self) -> Box<T> { Box::new(self) }
}
fn trouble<T>(t: T) -> Box<T> {
Example::generate(t)
}
And there are definitely cases where this pattern is important. To pick just one example, consider the following impl for the slice iterator:
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
// ...
}
It’s essential that downstream code be able to assume that <Iter<'a, T> as Iterator>::Item
is just &'a T
, no matter what 'a
and T
happen to be.
Furthermore, it doesn’t work to say that the compiler can make this kind of assumption unless specialization is being used, since we want to allow downstream crates to add specialized impls. We need to know up front.
Another possibility would be to simply disallow specialization of associated types. But the trouble described above isn’t limited to associated types. Every function/method in a trait has an implicit associated type that implements the closure types, and similar bad assumptions about blanket impls can crop up there. It’s not entirely clear whether they can be weaponized, however. (That said, it may be reasonable to stabilize only specialization of functions/methods to begin with, and wait for strong use cases of associated type specialization to emerge before stabilizing that.)
The solution proposed in this RFC is instead to treat specialization of items in a trait as a per-item opt in, described in the next section.
The default
keyword
Many statically-typed languages that allow refinement of behavior in some hierarchy also come with ways to signal whether or not this is allowed:
-
C++ requires the
virtual
keyword to permit a method to be overridden in subclasses. Modern C++ also supportsfinal
andoverride
qualifiers. -
C# requires the
virtual
keyword at definition andoverride
at point of overriding an existing method. -
Java makes things silently virtual, but supports
final
as an opt out.
Why have these qualifiers? Overriding implementations is, in a way, “action at a distance”. It means that the code that’s actually being run isn’t obvious when e.g. a class is defined; it can change in subclasses defined elsewhere. Requiring qualifiers is a way of signaling that this non-local change is happening, so that you know you need to look more globally to understand the actual behavior of the class.
While impl specialization does not directly involve virtual dispatch, it’s closely-related to inheritance, and it allows some amount of “action at a distance” (modulo, as we’ll see, coherence rules). We can thus borrow directly from these previous designs.
This RFC proposes a “final-by-default” semantics akin to C++ that is backwards-compatible with today’s Rust, which means that the following overlapping impls are prohibited:
impl<T> Example for T {
type Output = Box<T>;
fn generate(self) -> Box<T> { Box::new(self) }
}
impl Example for bool {
type Output = bool;
fn generate(self) -> bool { self }
}
The error in these impls is that the first impl is implicitly defining “final” versions of its items, which are thus not allowed to be refined in further specializations.
If you want to allow specialization of an item, you do so via the default
qualifier within the impl block:
impl<T> Example for T {
default type Output = Box<T>;
default fn generate(self) -> Box<T> { Box::new(self) }
}
impl Example for bool {
type Output = bool;
fn generate(self) -> bool { self }
}
Thus, when you’re trying to understand what code is going to be executed, if you
see an impl that applies to a type and the relevant item is not marked
default
, you know that the definition you’re looking at is the one that will
apply. If, on the other hand, the item is marked default
, you need to scan for
other impls that could apply to your type. The coherence rules, described below,
help limit the scope of this search in practice.
This design optimizes for fine-grained control over when specialization is permitted. It’s worth pausing for a moment and considering some alternatives and questions about the design:
-
Why mark
default
on impls rather than the trait? There are a few reasons to havedefault
apply at the impl level. First of all, traits are fundamentally interfaces, whiledefault
is really about implementations. Second, as we’ll see, it’s useful to be able to “seal off” a certain avenue of specialization while leaving others open; doing it at the trait level is an all-or-nothing choice. -
Why mark
default
on items rather than the entire impl? Again, this is largely about granularity; it’s useful to be able to pin down part of an impl while leaving others open for specialization. Furthermore, while this RFC doesn’t propose to do it, we could easily add a shorthand later on in whichdefault impl Trait for Type
is sugar for addingdefault
to all items in the impl. -
Won’t
default
be confused with default methods? Yes! But usefully so: as we’ll see, in this RFC’s design today’s default methods become sugar for tomorrow’s specialization.
Finally, how does default
help with the hazards described above? Easy: an
associated type from a blanket impl must be treated “opaquely” if it’s marked
default
. That is, if you write these impls:
impl<T> Example for T {
default type Output = Box<T>;
default fn generate(self) -> Box<T> { Box::new(self) }
}
impl Example for bool {
type Output = bool;
fn generate(self) -> bool { self }
}
then the function trouble
will fail to typecheck:
fn trouble<T>(t: T) -> Box<T> {
Example::generate(t)
}
The error is that <T as Example>::Output
no longer normalizes to Box<T>
,
because the applicable blanket impl marks the type as default
. The fact that
default
is an opt in makes this behavior backwards-compatible.
The main drawbacks of this solution are:
-
API evolution. Adding
default
to an associated type takes away some abilities, which makes it a breaking change to a public API. (In principle, this is probably true for functions/methods as well, but the breakage there is theoretical at most.) However, given the design constraints discussed so far, this seems like an inevitable aspect of any simple, backwards-compatible design. -
Verbosity. It’s possible that certain uses of the trait system will result in typing
default
quite a bit. This RFC takes a conservative approach of introducing the keyword at a fine-grained level, but leaving the door open to adding shorthands (like writingdefault impl ...
) in the future, if need be.
Overlapping impls and specialization
What is overlap?
Rust today does not allow any “overlap” between impls. Intuitively, this means
that you cannot write two trait impls that could apply to the same “input”
types. (An input type is either Self
or a type parameter of the trait). For
overlap to occur, the input types must be able to “unify”, which means that
there’s some way of instantiating any type parameters involved so that the input
types are the same. Here are some examples:
trait Foo {}
// No overlap: String and Vec<u8> cannot unify.
impl Foo for String {}
impl Foo for Vec<u8> {}
// No overlap: Vec<u16> and Vec<u8> cannot unify because u16 and u8 cannot unify.
impl Foo for Vec<u16> {}
impl Foo for Vec<u8> {}
// Overlap: T can be instantiated to String.
impl<T> Foo for T {}
impl Foo for String {}
// Overlap: Vec<T> and Vec<u8> can unify because T can be instantiated to u8.
impl<T> Foo for Vec<T> {}
impl Foo for Vec<u8>
// No overlap: String and Vec<T> cannot unify, no matter what T is.
impl Foo for String {}
impl<T> Foo for Vec<T> {}
// Overlap: for any T that is Clone, both impls apply.
impl<T> Foo for Vec<T> where T: Clone {}
impl<T> Foo for Vec<T> {}
// No overlap: implicitly, T: Sized, and since !Foo: Sized, you cannot instantiate T with it.
impl<T> Foo for Box<T> {}
impl Foo for Box<Foo> {}
trait Trait1 {}
trait Trait2 {}
// Overlap: nothing prevents a T such that T: Trait1 + Trait2.
impl<T: Trait1> Foo for T {}
impl<T: Trait2> Foo for T {}
trait Trait3 {}
trait Trait4: Trait3 {}
// Overlap: any T: Trait4 is covered by both impls.
impl<T: Trait3> Foo for T {}
impl<T: Trait4> Foo for T {}
trait Bar<T> {}
// No overlap: *all* input types must unify for overlap to happen.
impl Bar<u8> for u8 {}
impl Bar<u16> for u8 {}
// No overlap: *all* input types must unify for overlap to happen.
impl<T> Bar<u8> for T {}
impl<T> Bar<u16> for T {}
// No overlap: no way to instantiate T such that T == u8 and T == u16.
impl<T> Bar<T> for T {}
impl Bar<u16> for u8 {}
// Overlap: instantiate U as T.
impl<T> Bar<T> for T {}
impl<T, U> Bar<T> for U {}
// No overlap: no way to instantiate T such that T == &'a T.
impl<T> Bar<T> for T {}
impl<'a, T> Bar<&'a T> for T {}
// Overlap: instantiate T = &'a U.
impl<T> Bar<T> for T {}
impl<'a, T, U> Bar<T> for &'a U where U: Bar<T> {}
Permitting overlap
The goal of specialization is to allow overlapping impls, but it’s not as simple as permitting all overlap. There has to be a way to decide which of two overlapping impls to actually use for a given set of input types. The simpler and more intuitive the rule for deciding, the easier it is to write and reason about code – and since dispatch is already quite complicated, simplicity here is a high priority. On the other hand, the design should support as many of the motivating use cases as possible.
The basic intuition we’ve been using for specialization is the idea that one
impl is “more specific” than another it overlaps with. Before turning this
intuition into a rule, let’s go through the previous examples of overlap and
decide which, if any, of the impls is intuitively more specific. Note that since
we’re leaving out the body of the impls, you won’t see the default
keyword
that would be required in practice for the less specialized impls.
trait Foo {}
// Overlap: T can be instantiated to String.
impl<T> Foo for T {}
impl Foo for String {} // String is more specific than T
// Overlap: Vec<T> and Vec<u8> can unify because T can be instantiated to u8.
impl<T> Foo for Vec<T> {}
impl Foo for Vec<u8> // Vec<u8> is more specific than Vec<T>
// Overlap: for any T that is Clone, both impls apply.
impl<T> Foo for Vec<T> // "Vec<T> where T: Clone" is more specific than "Vec<T> for any T"
where T: Clone {}
impl<T> Foo for Vec<T> {}
trait Trait1 {}
trait Trait2 {}
// Overlap: nothing prevents a T such that T: Trait1 + Trait2
impl<T: Trait1> Foo for T {} // Neither is more specific;
impl<T: Trait2> Foo for T {} // there's no relationship between the traits here
trait Trait3 {}
trait Trait4: Trait3 {}
// Overlap: any T: Trait4 is covered by both impls.
impl<T: Trait3> Foo for T {}
impl<T: Trait4> Foo for T {} // T: Trait4 is more specific than T: Trait3
trait Bar<T> {}
// Overlap: instantiate U as T.
impl<T> Bar<T> for T {} // More specific since both input types are identical
impl<T, U> Bar<T> for U {}
// Overlap: instantiate T = &'a U.
impl<T> Bar<T> for T {} // Neither is more specific
impl<'a, T, U> Bar<T> for &'a U
where U: Bar<T> {}
What are the patterns here?
- Concrete types are more specific than type variables, e.g.:
String
is more specific thanT
Vec<u8>
is more specific thanVec<T>
- More constraints lead to more specific impls, e.g.:
T: Clone
is more specific thanT
Bar<T> for T
is more specific thanBar<T> for U
- Unrelated constraints don’t contribute, e.g.:
- Neither
T: Trait1
norT: Trait2
is more specific than the other.
- Neither
For many purposes, the above simple patterns are sufficient for working with specialization. But to provide a spec, we need a more general, formal way of deciding precedence; we’ll give one next.
Defining the precedence rules
An impl block I
contains basically two pieces of information relevant to
specialization:
- A set of type variables, like
T, U
inimpl<T, U> Bar<T> for U
.- We’ll call this
I.vars
.
- We’ll call this
- A set of where clauses, like
T: Clone
inimpl<T: Clone> Foo for Vec<T>
.- We’ll call this
I.wc
.
- We’ll call this
We’re going to define a specialization relation <=
between impl blocks, so
that I <= J
means that impl block I
is “at least as specific as” impl block
J
. (If you want to think of this in terms of “size”, you can imagine that the
set of types I
applies to is no bigger than those J
applies to.)
We’ll say that I < J
if I <= J
and !(J <= I)
. In this case, I
is more
specialized than J
.
To ensure specialization is coherent, we will ensure that for any two impls I
and J
that overlap, we have either I < J
or J < I
. That is, one must be
truly more specific than the other. Specialization chooses the “smallest” impl
in this order – and the new overlap rule ensures there is a unique smallest
impl among those that apply to a given set of input types.
More broadly, while <=
is not a total order on all impls of a given trait,
it will be a total order on any set of impls that all mutually overlap, which is
all we need to determine which impl to use.
One nice thing about this approach is that, if there is an overlap without there being an intersecting impl, the compiler can tell the programmer precisely which impl needs to be written to disambiguate the overlapping portion.
We’ll start with an abstract/high-level formulation, and then build up toward an algorithm for deciding specialization by introducing a number of building blocks.
Abstract formulation
Recall that the
input types
of a trait are the Self
type and all trait type parameters. So the following
impl has input types bool
, u8
and String
:
trait Baz<X, Y> { .. }
// impl I
impl Baz<bool, u8> for String { .. }
If you think of these input types as a tuple, (bool, u8, String
) you can think
of each trait impl I
as determining a set apply(I)
of input type tuples that
obeys I
’s where clauses. The impl above is just the singleton set apply(I) = { (bool, u8, String) }
. Here’s a more interesting case:
// impl J
impl<T, U> Baz<T, u8> for U where T: Clone { .. }
which gives the set apply(J) = { (T, u8, U) | T: Clone }
.
Two impls I
and J
overlap if apply(I)
and apply(J)
intersect.
We can now define the specialization order abstractly: I <= J
if
apply(I)
is a subset of apply(J)
.
This is true of the two sets above:
apply(I) = { (bool, u8, String) }
is a strict subset of
apply(J) = { (T, u8, U) | T: Clone }
Here are a few more examples.
Via where clauses:
// impl I
// apply(I) = { T | T a type }
impl<T> Foo for T {}
// impl J
// apply(J) = { T | T: Clone }
impl<T> Foo for T where T: Clone {}
// J < I
Via type structure:
// impl I
// apply(I) = { (T, U) | T, U types }
impl<T, U> Bar<T> for U {}
// impl J
// apply(J) = { (T, T) | T a type }
impl<T> Bar<T> for T {}
// J < I
The same reasoning can be applied to all of the examples we saw earlier, and the reader is encouraged to do so. We’ll look at one of the more subtle cases here:
// impl I
// apply(I) = { (T, T) | T any type }
impl<T> Bar<T> for T {}
// impl J
// apply(J) = { (T, &'a U) | U: Bar<T>, 'a any lifetime }
impl<'a, T, U> Bar<T> for &'a U where U: Bar<T> {}
The claim is that apply(I)
and apply(J)
intersect, but neither contains the
other. Thus, these two impls are not permitted to coexist according to this
RFC’s design. (We’ll revisit this limitation toward the end of the RFC.)
Algorithmic formulation
The goal in the remainder of this section is to turn the above abstract
definition of <=
into something closer to an algorithm, connected to existing
mechanisms in the Rust compiler. We’ll start by reformulating <=
in a way that
effectively “inlines” apply
:
I <= J
if:
-
For any way of instantiating
I.vars
, there is some way of instantiatingJ.vars
such that theSelf
type and trait type parameters match up. -
For this instantiation of
I.vars
, if you assumeI.wc
holds, you can proveJ.wc
.
It turns out that the compiler is already quite capable of answering these questions, via “unification” and “skolemization”, which we’ll see next.
Unification: solving equations on types
Unification is the workhorse of type inference and many other mechanisms in the Rust compiler. You can think of it as a way of solving equations on types that contain variables. For example, consider the following situation:
fn use_vec<T>(v: Vec<T>) { .. }
fn caller() {
let v = vec![0u8, 1u8];
use_vec(v);
}
The compiler ultimately needs to infer what type to use for the T
in use_vec
within the call in caller
, given that the actual argument has type
Vec<u8>
. You can frame this as a unification problem: solve the equation
Vec<T> = Vec<u8>
. Easy enough: T = u8
!
Some equations can’t be solved. For example, if we wrote instead:
fn caller() {
let s = "hello";
use_vec(s);
}
we would end up equating Vec<T> = &str
. There’s no choice of T
that makes
that equation work out. Type error!
Unification often involves solving a series of equations between types
simultaneously, but it’s not like high school algebra; the equations involved
all have the limited form of type1 = type2
.
One immediate way in which unification is relevant to this RFC is in determining when two impls “overlap”: roughly speaking, they overlap if each pair of input types can be unified simultaneously. For example:
// No overlap: String and bool do not unify
impl Foo for String { .. }
impl Foo for bool { .. }
// Overlap: String and T unify
impl Foo for String { .. }
impl<T> Foo for T { .. }
// Overlap: T = U, T = V is trivially solvable
impl<T> Bar<T> for T { .. }
impl<U, V> Bar<U> for V { .. }
// No overlap: T = u8, T = bool not solvable
impl<T> Bar<T> for T { .. }
impl Bar<u8> for bool { .. }
Note the difference in how concrete types and type variables work for
unification. When T
, U
and V
are variables, it’s fine to say that T = U
,
T = V
is solvable: we can make the impls overlap by instantiating all three
variables with the same type. But asking for e.g. String = bool
fails, because
these are concrete types, not variables. (The same happens in algebra; consider
that 2 = 3
cannot be solved, but x = y
and y = z
can be.) This
distinction may seem obvious, but we’ll next see how to leverage it in a
somewhat subtle way.
Skolemization: asking forall/there exists questions
We’ve already rephrased <=
to start with a “for all, there exists” problem:
- For any way of instantiating
I.vars
, there is some way of instantiatingJ.vars
such that theSelf
type and trait type parameters match up.
For example:
// impl I
impl<T> Bar<T> for T {}
// impl J
impl<U,V> Bar<U> for V {}
For any choice of T
, it’s possible to choose a U
and V
such that the two
impls match – just choose U = T
and V = T
. But the opposite isn’t possible:
if U
and V
are different (say, String
and bool
), then no choice of T
will make the two impls match up.
This feels similar to a unification problem, and it turns out we can solve it with unification using a scary-sounding trick known as “skolemization”.
Basically, to “skolemize” a type variable is to treat it as if it were a
concrete type. So if U
and V
are skolemized, then U = V
is unsolvable, in
the same way that String = bool
is unsolvable. That’s perfect for capturing
the “for any instantiation of I.vars” part of what we want to formalize.
With this tool in hand, we can further rephrase the “for all, there exists” part
of <=
in the following way:
- After skolemizing
I.vars
, it’s possible to unifyI
andJ
.
Note that a successful unification through skolemization gives you the same answer as you’d get if you unified without skolemizing.
The algorithmic version
One outcome of running unification on two impls as above is that we can understand both impl headers in terms of a single set of type variables. For example:
// Before unification:
impl<T> Bar<T> for T where T: Clone { .. }
impl<U, V> Bar<U> for Vec<V> where V: Debug { .. }
// After unification:
// T = Vec<W>
// U = Vec<W>
// V = W
impl<W> Bar<Vec<W>> for Vec<W> where Vec<W>: Clone { .. }
impl<W> Bar<Vec<W>> for Vec<W> where W: Debug { .. }
By putting everything in terms of a single set of type params, it becomes
possible to do things like compare the where
clauses, which is the last piece
we need for a final rephrasing of <=
that we can implement directly.
Putting it all together, we’ll say I <= J
if:
- After skolemizing
I.vars
, it’s possible to unifyI
andJ
. - Under the resulting unification,
I.wc
impliesJ.wc
Let’s look at a couple more examples to see how this works:
trait Trait1 {}
trait Trait2 {}
// Overlap: nothing prevents a T such that T: Trait1 + Trait2
impl<T: Trait1> Foo for T {} // Neither is more specific;
impl<T: Trait2> Foo for T {} // there's no relationship between the traits here
In comparing these two impls in either direction, we make it past unification
and must try to prove that one where clause implies another. But T: Trait1
does not imply T: Trait2
, nor vice versa, so neither impl is more specific
than the other. Since the impls do overlap, an ambiguity error is reported.
On the other hand:
trait Trait3 {}
trait Trait4: Trait3 {}
// Overlap: any T: Trait4 is covered by both impls.
impl<T: Trait3> Foo for T {}
impl<T: Trait4> Foo for T {} // T: Trait4 is more specific than T: Trait3
Here, since T: Trait4
implies T: Trait3
but not vice versa, we get
impl<T: Trait4> Foo for T < impl<T: Trait3> Foo for T
Key properties
Remember that for each pair of impls I
, J
, the compiler will check that
exactly one of the following holds:
I
andJ
do not overlap (a unification check), or elseI < J
, or elseJ < I
Recall also that if there is an overlap without there being an intersecting impl, the compiler can tell the programmer precisely which impl needs to be written to disambiguate the overlapping portion.
Since I <= J
ultimately boils down to a subset relationship, we get a lot of
nice properties for free (e.g., transitivity: if I <= J <= K
then I <= K
).
Together with the compiler check above, we know that at monomorphization time,
after filtering to the impls that apply to some concrete input types, there will
always be a unique, smallest impl in specialization order. (In particular, if
multiple impls apply to concrete input types, those impls must overlap.)
There are various implementation strategies that avoid having to recalculate the ordering during monomorphization, but we won’t delve into those details in this RFC.
Implications for coherence
The coherence rules ensure that there is never an ambiguity about which impl to use when monomorphizing code. Today, the rules consist of the simple overlap check described earlier, and the “orphan” check which limits the crates in which impls are allowed to appear (“orphan” refers to an impl in a crate that defines neither the trait nor the types it applies to). The orphan check is needed, in particular, so that overlap cannot be created accidentally when linking crates together.
The design in this RFC heavily revises the overlap check, as described above, but does not propose any changes to the orphan check (which is described in a blog post). Basically, the change to the overlap check does not appear to change the cases in which orphan impls can cause trouble. And a moment’s thought reveals why: if two sibling crates are unaware of each other, there’s no way that they could each provide an impl overlapping with the other, yet be sure that one of those impls is more specific than the other in the overlapping region.
Interaction with lifetimes
A hard constraint in the design of the trait system is that dispatch cannot depend on lifetime information. In particular, we both cannot, and should not allow specialization based on lifetimes:
-
We can’t, because when the compiler goes to actually generate code (“trans”), lifetime information has been erased – so we’d have no idea what specializations would soundly apply.
-
We shouldn’t, because lifetime inference is subtle and would often lead to counterintuitive results. For example, you could easily fail to get
'static
even if it applies, because inference is choosing the smallest lifetime that matches the other constraints.
To be more concrete, here are some scenarios which should not be allowed:
// Not allowed: trans doesn't know if T: 'static:
trait Bad1 {}
impl<T> Bad1 for T {}
impl<T: 'static> Bad1 for T {}
// Not allowed: trans doesn't know if two refs have equal lifetimes:
trait Bad2<U> {}
impl<T, U> Bad2<U> for T {}
impl<'a, T, U> Bad2<&'b U> for &'a T {}
But simply naming a lifetime that must exist, without constraining it, is fine:
// Allowed: specializes based on being *any* reference, regardless of lifetime
trait Good {}
impl<T> Good for T {}
impl<'a, T> Good for &'a T {}
In addition, it’s okay for lifetime constraints to show up as long as they aren’t part of specialization:
// Allowed: *all* impls impose the 'static requirement; the dispatch is happening
// purely based on `Clone`
trait MustBeStatic {}
impl<T: 'static> MustBeStatic for T {}
impl<T: 'static + Clone> MustBeStatic for T {}
Going down the rabbit hole
Unfortunately, we cannot easily rule out the undesirable lifetime-dependent specializations, because they can be “hidden” behind innocent-looking trait bounds that can even cross crates:
////////////////////////////////////////////////////////////////////////////////
// Crate marker
////////////////////////////////////////////////////////////////////////////////
trait Marker {}
impl Marker for u32 {}
////////////////////////////////////////////////////////////////////////////////
// Crate foo
////////////////////////////////////////////////////////////////////////////////
extern crate marker;
trait Foo {
fn foo(&self);
}
impl<T> Foo for T {
default fn foo(&self) {
println!("Default impl");
}
}
impl<T: marker::Marker> Foo for T {
fn foo(&self) {
println!("Marker impl");
}
}
////////////////////////////////////////////////////////////////////////////////
// Crate bar
////////////////////////////////////////////////////////////////////////////////
extern crate marker;
pub struct Bar<T>(T);
impl<T: 'static> marker::Marker for Bar<T> {}
////////////////////////////////////////////////////////////////////////////////
// Crate client
////////////////////////////////////////////////////////////////////////////////
extern crate foo;
extern crate bar;
fn main() {
// prints: Marker impl
0u32.foo();
// prints: ???
// the relevant specialization depends on the 'static lifetime
bar::Bar("Activate the marker!").foo();
}
The problem here is that all of the crates in isolation look perfectly innocent.
The code in marker
, bar
and client
is accepted today. It’s only when these
crates are plugged together that a problem arises – you end up with a
specialization based on a 'static
lifetime. And the client
crate may not
even be aware of the existence of the marker
crate.
If we make this kind of situation a hard error, we could easily end up with a scenario in which plugging together otherwise-unrelated crates is impossible.
Proposal: ask forgiveness, rather than permission
So what do we do? There seem to be essentially two avenues:
-
Be maximally permissive in the impls you can write, and then just ignore lifetime information in dispatch. We can generate a warning when this is happening, though in cases like the above, it may be talking about traits that the client is not even aware of. The assumption here is that these “missed specializations” will be extremely rare, so better not to impose a burden on everyone to rule them out.
-
Try, somehow, to prevent you from writing impls that appear to dispatch based on lifetimes. The most likely way of doing that is to somehow flag a trait as “lifetime-dependent”. If a trait is lifetime-dependent, it can have lifetime-sensitive impls (like ones that apply only to
'static
data), but it cannot be used when writing specialized impls of another trait.
The downside of (2) is that it’s an additional knob that all trait authors have to think about. That approach is sketched in more detail in the Alternatives section.
What this RFC proposes is to follow approach (1), at least during the initial experimentation phase. That’s the easiest way to gain experience with specialization and see to what extent lifetime-dependent specializations accidentally arise in practice. If they are indeed rare, it seems much better to catch them via a lint then to force the entire world of traits to be explicitly split in half.
To begin with, this lint should be an error by default; we want to get feedback as to how often this is happening before any stabilization.
What this means for the programmer
Ultimately, the goal of the “just ignore lifetimes for specialization” approach is to reduce the number of knobs in play. The programmer gets to use both lifetime bounds and specialization freely.
The problem, of course, is that when using the two together you can get surprising dispatch results:
trait Foo {
fn foo(&self);
}
impl<T> Foo for T {
default fn foo(&self) {
println!("Default impl");
}
}
impl Foo for &'static str {
fn foo(&self) {
println!("Static string slice: {}", self);
}
}
fn main() {
// prints "Default impl", but generates a lint saying that
// a specialization was missed due to lifetime dependence.
"Hello, world!".foo();
}
Specialization is refusing to consider the second impl because it imposes lifetime constraints not present in the more general impl. We don’t know whether these constraints hold when we need to generate the code, and we don’t want to depend on them because of the subtleties of region inference. But we alert the programmer that this is happening via a lint.
Sidenote: for such simple intracrate cases, we could consider treating the impls
themselves more aggressively, catching that the &'static str
impl will never
be used and refusing to compile it.
In the more complicated multi-crate example we saw above, the line
bar::Bar("Activate the marker!").foo();
would likewise print Default impl
and generate a warning. In this case, the
warning may be hard for the client
crate author to understand, since the trait
relevant for specialization – marker::Marker
– belongs to a crate that
hasn’t even been imported in client
. Nevertheless, this approach seems
friendlier than the alternative (discussed in Alternatives).
An algorithm for ignoring lifetimes in dispatch
Although approach (1) may seem simple, there are some subtleties in handling cases like the following:
trait Foo { ... }
impl<T: 'static> Foo for T { ... }
impl<T: 'static + Clone> Foo for T { ... }
In this “ignore lifetimes for specialization” approach, we still want the above
specialization to work, because all impls in the specialization family impose
the same lifetime constraints. The dispatch here purely comes down to T: Clone
or not. That’s in contrast to something like this:
trait Foo { ... }
impl<T> Foo for T { ... }
impl<T: 'static + Clone> Foo for T { ... }
where the difference between the impls includes a nontrivial lifetime constraint
(the 'static
bound on T
). The second impl should effectively be dead code:
we should never dispatch to it in favor of the first impl, because that depends
on lifetime information that we don’t have available in trans (and don’t want to
rely on in general, due to the way region inference works). We would instead
lint against it (probably error by default).
So, how do we tell these two scenarios apart?
-
First, we evaluate the impls normally, winnowing to a list of applicable impls.
-
Then, we attempt to determine specialization. For any pair of applicable impls
Parent
andChild
(whereChild
specializesParent
), we do the following:-
Introduce as assumptions all of the where clauses of
Parent
-
Attempt to prove that
Child
definitely applies, using these assumptions. Crucially, we do this test in a special mode: lifetime bounds are only considered to hold if they (1) follow from general well-formedness or (2) are directly assumed fromParent
. That is, a constraint inChild
thatT: 'static
has to follow either from some basic type assumption (like the type&'static T
) or from a similar clause inParent
. -
If the
Child
impl cannot be shown to hold under these more stringent conditions, then we have discovered a lifetime-sensitive specialization, and can trigger the lint. -
Otherwise, the specialization is valid.
-
Let’s do this for the two examples above.
Example 1
trait Foo { ... }
impl<T: 'static> Foo for T { ... }
impl<T: 'static + Clone> Foo for T { ... }
Here, if we think both impls apply, we’ll start by assuming that T: 'static
holds, and then we’ll evaluate whether T: 'static
and T: Clone
hold. The
first evaluation succeeds trivially from our assumption. The second depends on
T
, as you’d expect.
Example 2
trait Foo { ... }
impl<T> Foo for T { ... }
impl<T: 'static + Clone> Foo for T { ... }
Here, if we think both impls apply, we start with no assumption, and then
evaluate T: 'static
and T: Clone
. We’ll fail to show the former, because
it’s a lifetime-dependent predicate, and we don’t have any assumption that
immediately yields it.
This should scale to less obvious cases, e.g. using T: Any
rather than T: 'static
– because when trying to prove T: Any
, we’ll find we need to prove
T: 'static
, and then we’ll end up using the same logic as above. It also works
for cases like the following:
trait SometimesDep {}
impl SometimesDep for i32 {}
impl<T: 'static> SometimesDep for T {}
trait Spec {}
impl<T> Spec for T {}
impl<T: SometimesDep> Spec for T {}
Using Spec
on i32
will not trigger the lint, because the specialization is
justified without any lifetime constraints.
Default impls
An interesting consequence of specialization is that impls need not (and in fact sometimes cannot) provide all of the items that a trait specifies. Of course, this is already the case with defaulted items in a trait – but as we’ll see, that mechanism can be seen as just a way of using specialization.
Let’s start with a simple example:
trait MyTrait {
fn foo(&self);
fn bar(&self);
}
impl<T: Clone> MyTrait for T {
default fn foo(&self) { ... }
default fn bar(&self) { ... }
}
impl MyTrait for String {
fn bar(&self) { ... }
}
Here, we’re acknowledging that the blanket impl has already provided definitions
for both methods, so the impl for String
can opt to just re-use the earlier
definition of foo
. This is one reason for the choice of the keyword default
.
Viewed this way, items defined in a specialized impl are optional overrides of
those in overlapping blanket impls.
And, in fact, if we’d written the blanket impl differently, we could force the
String
impl to leave off foo
:
impl<T: Clone> MyTrait for T {
// now `foo` is "final"
fn foo(&self) { ... }
default fn bar(&self) { ... }
}
Being able to leave off items that are covered by blanket impls means that
specialization is close to providing a finer-grained version of defaulted items
in traits – one in which the defaults can become ever more refined as more is
known about the input types to the traits (as described in the Motivation
section). But to fully realize this goal, we need one other ingredient: the
ability for the blanket impl itself to leave off some items. We do this by
using the default
keyword at the impl
level:
trait Add<Rhs=Self> {
type Output;
fn add(self, rhs: Rhs) -> Self::Output;
fn add_assign(&mut self, rhs: Rhs);
}
default impl<T: Clone, Rhs> Add<Rhs> for T {
fn add_assign(&mut self, rhs: Rhs) {
let tmp = self.clone() + rhs;
*self = tmp;
}
}
A subsequent overlapping impl of Add
where Self: Clone
can choose to leave
off add_assign
, “inheriting” it from the partial impl above.
A key point here is that, as the keyword suggests, a partial
impl may be
incomplete: from the above code, you cannot assume that T: Add<T>
for any
T: Clone
, because no such complete impl has been provided.
Defaulted items in traits are just sugar for a default blanket impl:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
fn size_hint(&self) -> (usize, Option<usize>) {
(0, None)
}
// ...
}
// desugars to:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
fn size_hint(&self) -> (usize, Option<usize>);
// ...
}
default impl<T> Iterator for T {
fn size_hint(&self) -> (usize, Option<usize>) {
(0, None)
}
// ...
}
Default impls are somewhat akin to abstract base classes in object-oriented languages; they provide some, but not all, of the materials needed for a fully concrete implementation, and thus enable code reuse but cannot be used concretely.
Note that the semantics of default impls
and defaulted items in
traits is that both are implicitly marked default
– that is, both
are considered specializable. This choice gives a coherent mental
model: when you choose not to employ a default, and instead provide
your own definition, you are in effect overriding/specializing that
code. (Put differently, you can think of default impls as abstract base classes).
There are a few important details to nail down with the design. This RFC proposes starting with the conservative approach of applying the general overlap rule to default impls, same as with complete ones. That ensures that there is always a clear definition to use when providing subsequent complete impls. It would be possible, though, to relax this constraint and allow arbitrary overlap between default impls, requiring then whenever a complete impl overlaps with them, for each item, there is either a unique “most specific” default impl that applies, or else the complete impl provides its own definition for that item. Such a relaxed approach is much more flexible, probably easier to work with, and can enable more code reuse – but it’s also more complicated, and backwards-compatible to add on top of the proposed conservative approach.
Limitations
One frequent motivation for specialization is broader “expressiveness”, in particular providing a larger set of trait implementations than is possible today.
For example, the standard library currently includes an AsRef
trait
for “as-style” conversions:
pub trait AsRef<T> where T: ?Sized {
fn as_ref(&self) -> &T;
}
Currently, there is also a blanket implementation as follows:
impl<'a, T: ?Sized, U: ?Sized> AsRef<U> for &'a T where T: AsRef<U> {
fn as_ref(&self) -> &U {
<T as AsRef<U>>::as_ref(*self)
}
}
which allows these conversions to “lift” over references, which is in turn important for making a number of standard library APIs ergonomic.
On the other hand, we’d also like to provide the following very simple blanket implementation:
impl<'a, T: ?Sized> AsRef<T> for T {
fn as_ref(&self) -> &T {
self
}
}
The current coherence rules prevent having both impls, however, because they can in principle overlap:
AsRef<&'a T> for &'a T where T: AsRef<&'a T>
Another examples comes from the Option
type, which currently provides two
methods for unwrapping while providing a default value for the None
case:
impl<T> Option<T> {
fn unwrap_or(self, def: T) -> T { ... }
fn unwrap_or_else<F>(self, f: F) -> T where F: FnOnce() -> T { .. }
}
The unwrap_or
method is more ergonomic but unwrap_or_else
is more efficient
in the case that the default is expensive to compute. The original
collections reform RFC proposed a
ByNeed
trait that was rendered unworkable after unboxed closures landed:
trait ByNeed<T> {
fn compute(self) -> T;
}
impl<T> ByNeed<T> for T {
fn compute(self) -> T {
self
}
}
impl<F, T> ByNeed<T> for F where F: FnOnce() -> T {
fn compute(self) -> T {
self()
}
}
impl<T> Option<T> {
fn unwrap_or<U>(self, def: U) where U: ByNeed<T> { ... }
...
}
The trait represents any value that can produce a T
on demand. But the above
impls fail to compile in today’s Rust, because they overlap: consider ByNeed<F> for F
where F: FnOnce() -> F
.
There are also some trait hierarchies where a subtrait completely subsumes the
functionality of a supertrait. For example, consider PartialOrd
and Ord
:
trait PartialOrd<Rhs: ?Sized = Self>: PartialEq<Rhs> {
fn partial_cmp(&self, other: &Rhs) -> Option<Ordering>;
}
trait Ord: Eq + PartialOrd<Self> {
fn cmp(&self, other: &Self) -> Ordering;
}
In cases like this, it’s somewhat annoying to have to provide an impl for both
Ord
and PartialOrd
, since the latter can be trivially derived from the
former. So you might want an impl like this:
impl<T> PartialOrd<T> for T where T: Ord {
fn partial_cmp(&self, other: &T) -> Option<Ordering> {
Some(self.cmp(other))
}
}
But this blanket impl would conflict with a number of others that work to “lift”
PartialOrd
and Ord
impls over various type constructors like references and
tuples, e.g.:
impl<'a, A: ?Sized> Ord for &'a A where A: Ord {
fn cmp(&self, other: & &'a A) -> Ordering { Ord::cmp(*self, *other) }
}
impl<'a, 'b, A: ?Sized, B: ?Sized> PartialOrd<&'b B> for &'a A where A: PartialOrd<B> {
fn partial_cmp(&self, other: &&'b B) -> Option<Ordering> {
PartialOrd::partial_cmp(*self, *other)
}
The case where they overlap boils down to:
PartialOrd<&'a T> for &'a T where &'a T: Ord
PartialOrd<&'a T> for &'a T where T: PartialOrd
and there is no implication between either of the where clauses.
There are many other examples along these lines.
Unfortunately, none of these examples are permitted by the revised overlap rule in this RFC, because in none of these cases is one of the impls fully a “subset” of the other; the overlap is always partial.
It’s a shame to not be able to address these cases, but the benefit is a specialization rule that is very intuitive and accepts only very clear-cut cases. The Alternatives section sketches some different rules that are less intuitive but do manage to handle cases like those above.
If we allowed “relaxed” partial impls as described above, one could at least use
that mechanism to avoid having to give a definition directly in most cases. (So
if you had T: Ord
you could write impl PartialOrd for T {}
.)
Possible extensions
It’s worth briefly mentioning a couple of mechanisms that one could consider adding on top of specialization.
Inherent impls
It has long been folklore that inherent impls can be thought of as special, anonymous traits that are:
- Automatically in scope;
- Given higher dispatch priority than normal traits.
It is easiest to make this idea work out if you think of each inherent item as implicitly defining and implementing its own trait, so that you can account for examples like the following:
struct Foo<T> { .. }
impl<T> Foo<T> {
fn foo(&self) { .. }
}
impl<T: Clone> Foo<T> {
fn bar(&self) { .. }
}
In this example, the availability of each inherent item is dependent on a
distinct where
clause. A reasonable “desugaring” would be:
#[inherent] // an imaginary attribute turning on the "special" treatment of inherent impls
trait Foo_foo<T> {
fn foo(&self);
}
#[inherent]
trait Foo_bar<T> {
fn bar(&self);
}
impl<T> Foo_foo<T> for Foo<T> {
fn foo(&self) { .. }
}
impl<T: Clone> Foo_bar<T> for Foo<T> {
fn bar(&self) { .. }
}
With this idea in mind, it is natural to expect specialization to work for inherent impls, e.g.:
impl<T, I> Vec<T> where I: IntoIterator<Item = T> {
default fn extend(iter: I) { .. }
}
impl<T> Vec<T> {
fn extend(slice: &[T]) { .. }
}
We could permit such specialization at the inherent impl level. The semantics would be defined in terms of the folklore desugaring above.
(Note: this example was chosen purposefully: it’s possible to use specialization
at the inherent impl level to avoid refactoring the Extend
trait as described
in the Motivation section.)
There are more details about this idea in the appendix.
Super
Continuing the analogy between specialization and inheritance, one could imagine
a mechanism like super
to access and reuse less specialized implementations
when defining more specialized ones. While there’s not a strong need for this
mechanism as part of this RFC, it’s worth checking that the specialization
approach is at least compatible with super
.
Fortunately, it is. If we take super
to mean “the most specific impl
overlapping with this one”, there is always a unique answer to that question,
because all overlapping impls are totally ordered with respect to each other via
specialization.
Extending HRTBs
In the Motivation we mentioned the need to refactor the Extend
trait to take
advantage of specialization. It’s possible to work around that need by using
specialization on inherent impls (and having the trait impl defer to the
inherent one), but of course that’s a bit awkward.
For reference, here’s the refactoring:
// Current definition
pub trait Extend<A> {
fn extend<T>(&mut self, iterable: T) where T: IntoIterator<Item=A>;
}
// Refactored definition
pub trait Extend<A, T: IntoIterator<Item=A>> {
fn extend(&mut self, iterable: T);
}
One problem with this kind of refactoring is that you lose the ability to say
that a type T
is extendable by an arbitrary iterator, because every use of
the Extend
trait has to say precisely what iterator is supported. But the
whole point of this exercise is to have a blanket impl of Extend
for any
iterator that is then specialized later.
This points to a longstanding limitation: the trait system makes it possible to ask for any number of specific impls to exist, but not to ask for a blanket impl to exist – except in the limited case of lifetimes, where higher-ranked trait bounds allow you to do this:
trait Trait { .. }
impl<'a> Trait for &'a MyType { .. }
fn use_all<T>(t: T) where for<'a> &'a T: Trait { .. }
We could extend this mechanism to cover type parameters as well, so that you could write:
fn needs_extend_all<T>(t: T) where for<I: IntoIterator<Item=u8>> T: Extend<u8, I> { .. }
Such a mechanism is out of scope for this RFC.
Refining bounds on associated types
The design with default
makes specialization of associated types an
all-or-nothing affair, but it would occasionally be useful to say that
all further specializations will at least guarantee some additional
trait bound on the associated type. This is particularly relevant for
the “efficient inheritance” use case. Such a mechanism can likely be
added, if needed, later on.
Drawbacks
Many of the more minor tradeoffs have been discussed in detail throughout. We’ll focus here on the big picture.
As with many new language features, the most obvious drawback of this proposal is the increased complexity of the language – especially given the existing complexity of the trait system. Partly for that reason, the RFC errs on the side of simplicity in the design wherever possible.
One aspect of the design that mitigates its complexity somewhat is the fact that
it is entirely opt in: you have to write default
in an impl in order for
specialization of that item to be possible. That means that all the ways we have
of reasoning about existing code still hold good. When you do opt in to
specialization, the “obviousness” of the specialization rule should mean that
it’s easy to tell at a glance which of two impls will be preferred.
On the other hand, the simplicity of this design has its own drawbacks:
-
You have to lift out trait parameters to enable specialization, as in the
Extend
example above. Of course, this lifting can be hidden behind an additional trait, so that the end-user interface remains idiomatic. The RFC mentions a few other extensions for dealing with this limitation – either by employing inherent item specialization, or by eventually generalizing HRTBs. -
You can’t use specialization to handle some of the more “exotic” cases of overlap, as described in the Limitations section above. This is a deliberate trade, favoring simple rules over maximal expressiveness.
Finally, if we take it as a given that we want to support some form of “efficient inheritance” as at least a programming pattern in Rust, the ability to use specialization to do so, while also getting all of its benefits, is a net simplifier. The full story there, of course, depends on the forthcoming companion RFC.
Alternatives
Alternatives to specialization
The main alternative to specialization in general is an approach based on negative bounds, such as the one outlined in an earlier RFC. Negative bounds make it possible to handle many of the examples this proposal can’t (the ones in the Limitations section). But negative bounds are also fundamentally closed: they make it possible to perform a certain amount of specialization up front when defining a trait, but don’t easily support downstream crates further specializing the trait impls.
Alternative specialization designs
The “lattice” rule
The rule proposed in this RFC essentially says that overlapping impls must form chains, in which each one is strictly more specific than the last.
This approach can be generalized to lattices, in which partial overlap between impls is allowed, so long as there is an additional impl that covers precisely the area of overlap (the intersection). Such a generalization can support all of the examples mentioned in the Limitations section. Moving to the lattice rule is backwards compatible.
Unfortunately, the lattice rule (or really, any generalization beyond the proposed chain rule) runs into a nasty problem with our lifetime strategy. Consider the following:
trait Foo {}
impl<T, U> Foo for (T, U) where T: 'static {}
impl<T, U> Foo for (T, U) where U: 'static {}
impl<T, U> Foo for (T, U) where T: 'static, U: 'static {}
The problem is, if we allow this situation to go through typeck, by the time we actually generate code in trans, there is no possible impl to choose. That is, we do not have enough information to specialize, but we also don’t know which of the (overlapping) unspecialized impls actually applies. We can address this problem by making the “lifetime dependent specialization” lint issue a hard error for such intersection impls, but that means that certain compositions will simply not be allowed (and, as mentioned before, these compositions might involve traits, types, and impls that the programmer is not even aware of).
The limitations that the lattice rule addresses are fairly secondary to the main goals of specialization (as laid out in the Motivation), and so, since the lattice rule can be added later, the RFC sticks with the simple chain rule for now.
Explicit ordering
Another, perhaps more palatable alternative would be to take the specialization rule proposed in this RFC, but have some other way of specifying precedence when that rule can’t resolve it – perhaps by explicit priority numbering. That kind of mechanism is usually noncompositional, but due to the orphan rule, it’s a least a crate-local concern. Like the alternative rule above, it could be added backwards compatibly if needed, since it only enables new cases.
Singleton non-default wins
@pnkfelix suggested the following rule, which allows overlap so long as there is a unique non-default item.
For any given type-based lookup, either:
There are no results (error)
There is only one lookup result, in which case we’re done (regardless of whether it is tagged as default or not),
There is a non-empty set of results with defaults, where exactly one result is non-default – and then that non-default result is the answer, or
There is a non-empty set of results with defaults, where 0 or >1 results are non-default (and that is an error).
This rule is arguably simpler than the one proposed in this RFC, and can accommodate the examples we’ve presented throughout. It would also support some of the cases this RFC cannot, because the default/non-default distinction can be used to specify an ordering between impls when the subset ordering fails to do so. For that reason, it is not forward-compatible with the main proposal in this RFC.
The downsides are:
-
Because actual dispatch occurs at monomorphization, errors are generated quite late, and only at use sites, not impl sites. That moves traits much more in the direction of C++ templates.
-
It’s less scalable/compositional: this alternative design forces the “specialization hierarchy” to be flat, in particular ruling out multiple levels of increasingly-specialized blanket impls.
Alternative handling of lifetimes
This RFC proposes a laissez faire approach to lifetimes: we let you write whatever impls you like, then warn you if some of them are being ignored because the specialization is based purely on lifetimes.
The main alternative approach is to make a more “principled” distinction between two kinds of traits: those that can be used as constraints in specialization, and those whose impls can be lifetime dependent. Concretely:
#[lifetime_dependent]
trait Foo {}
// Only allowed to use 'static here because of the lifetime_dependent attribute
impl Foo for &'static str {}
trait Bar { fn bar(&self); }
impl<T> Bar for T {
// Have to use `default` here to allow specialization
default fn bar(&self) {}
}
// CANNOT write the following impl, because `Foo` is lifetime_dependent
// and Bar is not.
//
// NOTE: this is what I mean by *using* a trait in specialization;
// we are trying to say a specialization applies when T: Foo holds
impl<T: Foo> Bar for T {
fn bar(&self) { ... }
}
// CANNOT write the following impl, because `Bar` is not lifetime_dependent
impl Bar for &'static str {
fn bar(&self) { ... }
}
There are several downsides to this approach:
-
It forces trait authors to consider a rather subtle knob for every trait they write, choosing between two forms of expressiveness and dividing the world accordingly. The last thing the trait system needs is another knob.
-
Worse still, changing the knob in either direction is a breaking change:
-
If a trait gains a
lifetime_dependent
attribute, any impl of a different trait that used it to specialize would become illegal. -
If a trait loses its
lifetime_dependent
attribute, any impl of that trait that was lifetime dependent would become illegal.
-
-
It hobbles specialization for some existing traits in
std
.
For the last point, consider From
(which is tied to Into
). In
std
, we have the following important “boxing” impl:
impl<'a, E: Error + 'a> From<E> for Box<Error + 'a>
This impl would necessitate From
(and therefore, Into
) being
marked lifetime_dependent
. But these traits are very likely to be
used to describe specializations (e.g., an impl that applies when T: Into<MyType>
).
There does not seem to be any way to consider such impls as lifetime-independent, either, because of examples like the following:
// If we consider this innocent...
trait Tie {}
impl<'a, T: 'a> Tie for (T, &'a u8)
// ... we get into trouble here
trait Foo {}
impl<'a, T> Foo for (T, &'a u8)
impl<'a, T> Foo for (T, &'a u8) where (T, &'a u8): Tie
All told, the proposed laissez faire seems a much better bet in practice, but only experience with the feature can tell us for sure.
Unresolved questions
All questions from the RFC discussion and prototype have been resolved.
Appendix
More details on inherent impls
One tricky aspect for specializing inherent impls is that, since there is no
explicit trait definition, there is no general signature that each definition of
an inherent item must match. Thinking about Vec
above, for example, notice
that the two signatures for extend
look superficially different, although it’s
clear that the first impl is the more general of the two.
It’s workable to use a very simple-minded conceptual desugaring: each item desugars into a distinct trait, with type parameters for e.g. each argument and the return type. All concrete type information then emerges from desugaring into impl blocks. Thus, for example:
impl<T, I> Vec<T> where I: IntoIterator<Item = T> {
default fn extend(iter: I) { .. }
}
impl<T> Vec<T> {
fn extend(slice: &[T]) { .. }
}
// Desugars to:
trait Vec_extend<Arg, Result> {
fn extend(Arg) -> Result;
}
impl<T, I> Vec_extend<I, ()> for Vec<T> where I: IntoIterator<Item = T> {
default fn extend(iter: I) { .. }
}
impl<T> Vec_extend<&[T], ()> for Vec<T> {
fn extend(slice: &[T]) { .. }
}
All items of a given name must desugar to the same trait, which means that the
number of arguments must be consistent across all impl blocks for a given Self
type. In addition, we’d require that all of the impl blocks overlap (meaning
that there is a single, most general impl). Without these constraints, we would
implicitly be permitting full-blown overloading on both arity and type
signatures. For the time being at least, we want to restrict overloading to
explicit uses of the trait system, as it is today.
This “desugaring” semantics has the benefits of allowing inherent item specialization, and also making it actually be the case that inherent impls are really just implicit traits – unifying the two forms of dispatch. Note that this is a breaking change, since examples like the following are (surprisingly!) allowed today:
struct Foo<A, B>(A, B);
impl<A> Foo<A,A> {
fn foo(&self, _: u32) {}
}
impl<A,B> Foo<A,B> {
fn foo(&self, _: bool) {}
}
fn use_foo<A, B>(f: Foo<A,B>) {
f.foo(true)
}
As has been proposed elsewhere, this “breaking change” could be made available through a feature flag that must be used even after stabilization (to opt in to specialization of inherent impls); the full details will depend on pending revisions to RFC 1122.