- Feature Name:
is_sorted
- Start Date: 2018-02-24
- RFC PR: rust-lang/rfcs#2351
- Rust Issue: rust-lang/rust#53485
Summary
Add the methods is_sorted
, is_sorted_by
and is_sorted_by_key
to [T]
;
add the methods is_sorted
and is_sorted_by
to Iterator
.
Motivation
In quite a few situations, one needs to check whether a sequence of elements is sorted. The most important use cases are probably unit tests and pre-/post-condition checks.
The lack of an is_sorted()
function in Rust’s standard library has led to
countless programmers implementing their own.
While it is possible to write a one-liner using iterators (e.g.
(0..arr.len() - 1).all(|i| arr[i] <= arr[i + 1])
¹), it is still unnecessary
mental overhead while writing and reading the code.
In the corresponding issue on the main repository (from which a few comments are referenced) everyone seems to agree on the basic premise: we want such a function.
Having is_sorted()
and friends in the standard library would:
- prevent people from spending time on writing their own,
- improve readbility of the code by clearly showing the author’s intent,
- and encourage to write more unit tests and/or pre-/post-condition checks.
Another proof of this functions’ usefulness is the inclusion in the
standard library of many other languages:
C++’s std::is_sorted
,
Go’s sort.IsSorted
,
D’s std.algorithm.sorting.is_sorted
and others. (Curiously, many (mostly) more high-level programming language –
like Ruby, Javascript, Java, Haskell and Python – seem to lack such a function.)
¹ In the initial version of this RFC, this code snippet contained a bug
(<
instead of <=
). This subtle mistake happens very often: in this RFC,
in the discussion thread about this RFC,
in this StackOverflow answer
and in many more places. Thus, avoiding this common bug is another good
reason to add is_sorted()
.
Fast Implementation via SIMD
Lastly, it is possible to implement is_sorted
for many common types with SIMD
instructions which improves speed significantly. It is unlikely that many
programmers will take the time to write SIMD code themselves, thus everyone
would benefit if this rather difficult implementation work is done in the
standard library.
Guide-level explanation
Possible documentation of the two new methods of Iterator
as well as
[T]::is_sorted_by_key
:
fn is_sorted(self) -> bool where Self::Item: PartialOrd,
Checks if the elements of this iterator are sorted.
That is, for each element
a
and its following elementb
,a <= b
must hold. If the iterator yields exactly zero or one element,true
is returned.Note that if
Self::Item
is onlyPartialOrd
, but notOrd
, the above definition implies that this function returnsfalse
if any two consecutive items are not comparable.Example
assert!([1, 2, 2, 9].iter().is_sorted()); assert!(![1, 3, 2, 4).iter().is_sorted()); assert!([0].iter().is_sorted()); assert!(std::iter::empty::<i32>().is_sorted()); assert!(![0.0, 1.0, std::f32::NAN].iter().is_sorted());
fn is_sorted_by<F>(self, compare: F) -> bool where F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>,
Checks if the elements of this iterator are sorted using the given comparator function.
Instead of using
PartialOrd::partial_cmp
, this function uses the givencompare
function to determine the ordering of two elements. Apart from that, it’s equivalent tois_sorted
; see its documentation for more information.
(for
[T]
)fn is_sorted_by_key<F, K>(&self, f: F) -> bool where F: FnMut(&T) -> K, K: PartialOrd,
Checks if the elements of this slice are sorted using the given key extraction function.
Instead of comparing the slice’s elements directly, this function compares the keys of the elements, as determined by
f
. Apart from that, it’s equivalent tois_sorted
; see its documentation for more information.Example
assert!(["c", "bb", "aaa"].is_sorted_by_key(|s| s.len())); assert!(![-2i32, -1, 0, 3].is_sorted_by_key(|n| n.abs()));
The methods [T]::is_sorted
and [T]::is_sorted_by
will have analogous
documentations to the ones shown above.
Reference-level explanation
This RFC proposes to add the following methods to [T]
(slices) and
Iterator
:
impl<T> [T] {
fn is_sorted(&self) -> bool
where
T: PartialOrd,
{ ... }
fn is_sorted_by<F>(&self, compare: F) -> bool
where
F: FnMut(&T, &T) -> Option<Ordering>,
{ ... }
fn is_sorted_by_key<F, K>(&self, f: F) -> bool
where
F: FnMut(&T) -> K,
K: PartialOrd,
{ ... }
}
trait Iterator {
fn is_sorted(self) -> bool
where
Self::Item: PartialOrd,
{ ... }
fn is_sorted_by<F>(mut self, compare: F) -> bool
where
F: FnMut(&Self::Item, &Self::Item) -> Option<Ordering>,
{ ... }
}
In addition to the changes shown above, the three methods added to [T]
should
also be added to core::slice::SliceExt
as they don’t require heap
allocations.
To repeat the exact semantics from the prior section: the methods return
true
if and only if for each element a
and its following element b
, the
condition a <= b
holds. For slices/iterators with zero or one element,
true
is returned. For elements which implement PartialOrd
, but not Ord
,
the function returns false
if any two consecutive elements are not
comparable (this is an implication of the a <= b
condition from above).
A sample implementation can be found here.
Drawbacks
It increases the size of the standard library by a tiny bit.
Rationale and alternatives
Only add the methods to Iterator
, but not to [T]
Without is_sorted()
defined for slices directly, one can still fairly easily
test if a slice is sorted by obtaining an iterator via iter()
. So instead of
v.is_sorted()
, one would need to write v.iter().is_sorted()
.
This always works for is_sorted()
because of the PartialOrd
blanket impl
which implements PartialOrd
for all references to an PartialOrd
type. For
is_sorted_by
it would introduce an additional reference to the closures’
arguments (i.e. v.iter().is_sorted_by(|a, b| ...))
where a
and b
are
&&T
).
While these two inconveniences are not deal-breakers, being able to call those
three methods on slices (and all Deref<Target=[T]>
types) directly, could be
favourable for many programmers (especially given the popularity of slice-like
data structures, like Vec<T>
). Additionally, the sort
method and friends
are defined for slices, thus one might expect the is_sorted()
method there,
too.
Add the three methods to additional data structures (like LinkedList
) as well
Adding these methods to every data structure in the standard library is a lot of
duplicate code. Optimally, we would have a trait that represents sequential
data structures and would only add is_sorted
and friends to said trait. We
don’t have such a trait as of now; so Iterator
is the next best thing. Slices
deserve special treatment due to the reasons mentioned above (popularity and
sort()
).
Iterator::while_sorted
, is_sorted_until
, sorted_prefix
, num_sorted
, …
In the issue on the main repository,
concerns about completely consuming the iterator were raised. Some alternatives,
such as while_sorted
,
were suggested. However, consuming the iterator is neither uncommon nor a
problem. Methods like count()
, max()
and many more consume the iterator,
too. One comment mentions:
I am a bit skeptical of the equivalent on Iterator just because the return value does not seem actionable – you aren’t going to “sort” the iterator after you find out it is not already sorted. What are some use cases for this in real code that does not involve iterating over a slice?
As mentioned above, Iterator
is the next best thing to a trait representing
sequential data structures. So to check if a LinkedList
, VecDeque
or
another sequential data structure is sorted, one would simply call
collection.iter().is_sorted()
. It’s likely that this is the main usage for
Iterator
’s is_sorted
methods. Additionally, code like
if v.is_sorted() { v.sort(); }
is not very useful: sort()
already runs in
O(n) for already sorted arrays.
Suggestions like is_sorted_until
are not really useful either: one can easily
get a subslice or a part of an iterator (via .take()
) and call is_sorted()
on that part.
Unresolved questions
Should Iterator::is_sorted_by_key
be added as well?
This RFC proposes to add is_sorted_by_key
only to [T]
but not to
Iterator
. The latter addition wouldn’t be too useful since once could easily
achieve the same effect as .is_sorted_by_key(...)
by calling
.map(...).is_sorted()
. It might still be favourable to include said function
for consistency and ease of use. The standard library already hosts a number of
sorting-related functions all of which come in three flavours: raw, _by
and
_by_key
. By now, programmers could expect there to be an is_sorted_by_key
as well.
Add std::cmp::is_sorted
instead
As suggested here,
one could also add this free function (plus the _by
and _by_key
versions)
to std::cmp
:
fn is_sorted<C>(collection: C) -> bool
where
C: IntoIterator,
C::Item: Ord,
This can be seen as a better design as it avoids the question about which data
structure should get is_sorted
methods. However, it might have the
disadvantage of being less discoverable and also less convenient (long path or
import).
Require Ord
instead of only PartialOrd
As proposed in this RFC, is_sorted
only requires its elements to be
PartialOrd
. If two non-comparable elements are encountered, false
is
returned. This is probably the only useful way to define the function for
partially orderable elements.
While it’s convenient to call is_sorted()
on slices containing only
partially orderable elements (like floats), we might want to use the stronger
Ord
bound:
- Firstly, for most programmers it’s probably not immediately clear how the function is defined for partially ordered elements (the documentation should be sufficient as explanation, though).
- Secondly, being able to call
is_sorted
on something will probably make most programmers think, that callingsort
on the same thing is possible, too. Having different bounds foris_sorted
andsort
thus might lead to confusion. - Lastly, the
is_sorted_by
function currently uses a closure which returnsOption<Ordering>
. This differs from the closure forsort_by
and looks a bit more complicated than necessary for most cases.