- Start Date: 2014-10-07
- RFC PR: rust-lang/rfcs#240
- Rust Issue: rust-lang/rust#17863
Summary
This is a conventions RFC for settling the location of unsafe
APIs relative
to the types they work with, as well as the use of raw
submodules.
The brief summary is:
-
Unsafe APIs should be made into methods or static functions in the same cases that safe APIs would be.
-
raw
submodules should be used only to define explicit low-level representations.
Motivation
Many data structures provide unsafe APIs either for avoiding checks or working
directly with their (otherwise private) representation. For example, string
provides:
-
An
as_mut_vec
method onString
that provides aVec<u8>
view of the string. This method makes it easy to work with the byte-based representation of the string, but thereby also allows violation of the utf8 guarantee. -
A
raw
submodule with a number of free functions, likefrom_parts
, that constructs aString
instances from a raw-pointer-based representation, afrom_utf8
variant that does not actually check for utf8 validity, and so on. The unifying theme is that all of these functions avoid checking some key invariant.
The problem is that currently, there is no clear/consistent guideline about
which of these APIs should live as methods/static functions associated with a
type, and which should live in a raw
submodule. Both forms appear throughout
the standard library.
Detailed design
The proposed convention is:
-
When an unsafe function/method is clearly “about” a certain type (as a way of constructing, destructuring, or modifying values of that type), it should be a method or static function on that type. This is the same as the convention for placement of safe functions/methods. So functions like
string::raw::from_parts
would become static functions onString
. -
raw
submodules should only be used to define low-level types/representations (and methods/functions on them). Methods for converting to/from such low-level types should be available directly on the high-level types. Examples:core::raw
,sync::raw
.
The benefits are:
-
Ergonomics. You can gain easy access to unsafe APIs merely by having a value of the type (or, for static functions, importing the type).
-
Consistency and simplicity. The rules for placement of unsafe APIs are the same as those for safe APIs.
The perspective here is that marking APIs unsafe
is enough to deter their use
in ordinary situations; they don’t need to be further distinguished by placement
into a separate module.
There are also some naming conventions to go along with unsafe static functions and methods:
-
When an unsafe function/method is an unchecked variant of an otherwise safe API, it should be marked using an
_unchecked
suffix.For example, the
String
module should provide bothfrom_utf8
andfrom_utf8_unchecked
constructors, where the latter does not actually check the utf8 encoding. Thestring::raw::slice_bytes
andstring::raw::slice_unchecked
functions should be merged into a singleslice_unchecked
method on strings that checks neither bounds nor utf8 boundaries. -
When an unsafe function/method produces or consumes a low-level representation of a data structure, the API should use
raw
in its name. Specifically,from_raw_parts
is the typical name used for constructing a value from e.g. a pointer-based representation. -
Otherwise, consider using a name that suggests why the API is unsafe. In some cases, like
String::as_mut_vec
, other stronger conventions apply, and theunsafe
qualifier on the signature (together with API documentation) is enough.
The unsafe methods and static functions for a given type should be placed in
their own impl
block, at the end of the module defining the type; this will
ensure that they are grouped together in rustdoc. (Thanks @lilyball for the
suggestion.)
Drawbacks
One potential drawback of these conventions is that the documentation for a
module will be cluttered with rarely-used unsafe
APIs, whereas the raw
submodule approach neatly groups these APIs. But rustdoc could easily be
changed to either hide or separate out unsafe
APIs by default, and in the
meantime the impl
block grouping should help.
More specifically, the convention of placing unsafe constructors in raw
makes
them very easy to find. But the usual from_
convention, together with the
naming conventions suggested above, should make it fairly easy to discover such
constructors even when they’re supplied directly as static functions.
More generally, these conventions give unsafe
APIs more equal status with safe
APIs. Whether this is a drawback depends on your philosophy about the status
of unsafe programming. But on a technical level, the key point is that the APIs
are marked unsafe
, so users still have to opt-in to using them. Ed note: from
my perspective, low-level/unsafe programming is important to support, and there
is no reason to penalize its ergonomics given that it’s opt-in anyway.
Alternatives
There are a few alternatives:
-
Rather than providing unsafe APIs directly as methods/static functions, they could be grouped into a single extension trait. For example, the
String
type could be accompanied by aStringRaw
extension trait providing APIs for working with raw string representations. This would allow a clear grouping of unsafe APIs, while still providing them as methods/static functions and allowing them to easily be imported with e.g.use std::string::StringRaw
. On the other hand, it still further penalizes the raw APIs (beyond marking themunsafe
), and given that rustdoc could easily provide API grouping, it’s unclear exactly what the benefit is. -
Use
raw
for functions that construct a value of the type without checking for one or more invariants.The advantage is that it’s easy to find such invariant-ignoring functions. The disadvantage is that their ergonomics is worsened, since they much be separately imported or referenced through a lengthy path:
// Compare the ergonomics: string::raw::slice_unchecked(some_string, start, end) some_string.slice_unchecked(start, end)
-
Another suggestion by @lilyball is to keep the basic structure of
raw
submodules, but use associated types to improve the ergonomics. Details (and discussions of pros/cons) are in this comment. -
Use
raw
submodules to group together all manipulation of low-level representations. No module instd
currently does this; existing modules provide some free functions inraw
, and some unsafe methods, without a clear driving principle. The ergonomics of moving everything into free functions in araw
submodule are quite poor.
Unresolved questions
The core::raw
module provides structs with public representations equivalent
to several built-in and library types (boxes, closures, slices, etc.). It’s not
clear whether the name of this module, or the location of its contents, should
change as a result of this RFC. The module is a special case, because not all of
the types it deals with even have corresponding modules/type declarations – so
it probably suffices to leave decisions about it to the API stabilization
process.