- Feature Name:
public_private_dependencies
- Start Date: 2017-04-03
- RFC PR: rust-lang/rfcs#1977
- Rust Issue: rust-lang/rust#44663
Summary
Introduce a public/private distinction to crate dependencies.
Motivation
The crates ecosystem has greatly expanded since Rust 1.0. With that, a few patterns for
dependencies have evolved that challenge the currently existing dependency declaration
system in Cargo and Rust. The most common problem is that a crate A
depends on another
crate B
but some of the types from crate B
are exposed through the API in crate A
.
This causes problems in practice if that dependency B
is also used by the user’s code
itself, crate B
resolves to different versions for each usage, and the values of types
from the two crate B
instances need to be used together but don’t match. In this case,
the user’s code will refuse to compile because different versions of those libraries are
requested, and the compiler messages are less than clear.
The introduction of an explicit distinction between public and private dependencies can solve some of these issues. This distinction should also let us lift some restrictions and make some code compile that previously was prevented from compiling.
Q: What is a public dependency?
A: A dependency is public if some of the types or traits of that dependency are themselves
exported through the public API of main crate. The most common places where this happens
are return values and function parameters. The same applies to trait implementations and
many other things. Because “public” can be tricky to determine for a user, this RFC
proposes to extend the compiler infrastructure to detect the concept of a “public
dependency”. This will help the user understand this concept so they can avoid making
mistakes in the Cargo.toml
.
Effectively, the idea is that if you bump a public dependency’s version, it’s a breaking change of your own crate.
Q: What is a private dependency?
A: On the other hand, a private dependency is contained within your crate and effectively
invisible for users of your crate. As a result, private dependencies can be freely
duplicated in the dependency graph and won’t cause compilation errors. This distinction
will also make it possible to relax some restrictions that currently exist in Cargo which
sometimes prevent crates from compiling.
Q: Can public become private later?
A: Public dependencies are public within a reachable subgraph but can become private if a
crate stops exposing a public dependency. For instance, it is very possible to have a
family of crates that all depend on a utility crate that provides common types which is a
public dependency for all of them. However, if your own crate ends up being a user of this
utility crate but none of its types or traits become part of your own API, then this
utility crate dependency is marked private.
Q: Where is public / private defined?
Dependencies are private by default and are made public through a public
flag on the
dependency in the Cargo.toml
file. This also means that crates created before the
implementation of this RFC will have all their dependencies private.
Q: How is backwards compatibility handled?
A: It will continue to be permissible to “leak” dependencies (and there are even some use
cases of this), however, the compiler or Cargo will emit warnings if private dependencies
are part of the public API. Later, it might even become invalid to publish new crates
without explicitly silencing these warnings or marking the dependencies as public.
Q: Can I export a type from a private dependency as my own?
A: For now, it will not be strictly permissible to privately depend on a crate and export a
type from there as your own. The reason for this is that at the moment it is not possible
to force this type to be distinct. This means that users of the crate might accidentally
start depending on that type to be compatible if the user starts to depend on the crate
that actually implements that type. The limitations from the previous answer apply (e.g.:
you can currently overrule the restrictions).
Q: How do semver and dependencies interact?
A: It is already the case that changing your own dependencies would require a semver bump
for your own library because your API contract to the outside world changes. This RFC,
however, makes it possible to only have this requirement for public dependencies and would
permit Cargo to prevent new crate releases with semver violations.
Detailed design
There are a few areas that need to be changed for this RFC:
- The compiler needs to be extended to understand when crate dependencies are considered a public dependency
- The
Cargo.toml
manifest needs to be extended to support declaring public dependencies. This will start as an unstable cargo feature available on nightly and only via opt-in. - The
public
attribute of dependencies needs to appear in the Cargo index in order to be used by Cargo during version resolution - Cargo’s version resolution needs to change to reject crate graph resolutions where two versions of a crate are publicly reachable to each other.
- The
cargo publish
process needs to be changed to warn (or prevent) the publishing of crates that have undeclared public dependencies cargo publish
will resolve dependencies to the lowest possible versions in order to check that the minimal version specified inCargo.toml
is correct.- Crates.io should show public dependencies more prominently than private ones.
Compiler Changes
The main change to the compiler will be to accept a new parameter that Cargo
supplies which is a list of public dependencies. The flag will be called
--extern-public
. The compiler then emits warnings if it encounters private
dependencies leaking to the public API of a crate. cargo publish
might change
this warning into an error in its lint step.
Additionally, later on, the warning can turn into a hard error in general.
In some situations, it can be necessary to allow private dependencies to become
part of the public API. In that case one can permit this with
#[allow(external_private_dependency)]
. This is particularly useful when
paired with #[doc(hidden)]
and other already existing hacks.
This most likely will also be necessary for the more complex relationship of
libcore
and libstd
in Rust itself.
Changes to Cargo.toml
The Cargo.toml
file will be amended to support the new public
parameter on
dependencies. Old Cargo versions will emit a warning when this key is encountered
but otherwise continue. Since the default for a dependency to be private only,
public ones will need to be tagged which should be the minority.
This will start as an unstable Cargo feature available on nightly only that authors
will need to opt into via a feature specified in Cargo.toml
before Cargo will
start using the public
attribute to change the way versions are resolved. The
Cargo unstable feature will turn on a corresponding rustc unstable feature for
the compiler changes noted above.
Example dependency:
[dependencies]
url = { version = "1.4.0", public = true }
Changes to the Cargo Index
The Cargo index used by Cargo when
resolving versions will contain the public
attribute on dependencies as specified
in Cargo.toml
. For example, an index line for a crate named example
that
publicly depends on the url
crate would look like (JSON prettified for legibility):
{
"name":"example",
"vers":"0.1.0",
"deps":[
{
"name":"url",
"req":"^1.4.0",
"public":"true",
"features":[],
"optional":false,
"default_features":true,
"target":null,
"kind":"normal"
}
]
}
Changes to Cargo Version Resolution
Cargo will specifically reject graphs that contain two different versions of the same crate being publicly depended upon and reachable from each other. This will prevent the strange errors possible today at version resolution time rather than at compile time.
How this will work:
- First, a resolution graph has a bunch of nodes. These nodes are “package ids” which are a triple of (name, source, version). Basically this means that different versions of the same crate are different nodes, and different sources of the same name (e.g. git and crates.io) are also different nodes.
- There are directed edges between nodes. A directed edge represents a dependency. For example if A depends on B then there’s a directed edge from A to B.
- With public/private dependencies, we can now say that every edge is either tagged with public or private.
- This means that we can have a collection of subgraphs purely connected by public dependency edges. The directionality of the public dependency edges within the subgraph doesn’t matter. Each of these subgraphs represents an “ecosystem” of crates publicly depending on each other. These subgraphs are “pools of public types” where if you have access to the subgraph, you have access to all types within that pool of types.
- We can place a constraint that each of these “publicly connected subgraphs” are required to have exactly one version of all crates internally. For example, each subgraph can only have one version of Hyper.
- Finally, we can consider all pairs of edges coming out of one node in the resolution graph. If the two edges point to two distinct publicly connected subgraphs from above and those subgraphs contain two different versions of the same crate, we consider that an error. This basically means that if you privately depend on Hyper 0.3 and Hyper 0.4, that’s an error.
Changes to Cargo Publish: Warnings
When a new crate version is published, Cargo will warn about types and traits that
the compiler determined to be public but did not come from a public dependency. For
now, it should be possible to publish anyways but in some period in the future it
will be necessary to explicitly mark all public dependencies as such or explicitly
mark them with #[allow(external_private_dependency)]
.
Changes to Cargo Publish: Lowest Version Resolution
A very common situation today is that people write the initial version of a dependency in their Cargo.toml, but never bother to update it as they take advantage of new features in newer versions. This works out okay because (1) Cargo will generally use the largest version it can find, compatible with constraints, and (2) upper bounds on constraints (at least within a particular minor version) are relatively rare. That means, in particular, that Cargo.toml is not a fully accurate picture of version dependency information; in general it’s a lower bound at best. There can be “invisible” dependencies that don’t cause resolution failures but can create compilation errors as APIs evolve.
Public dependencies exacerbate the above problem, because you can end up relying on features of a “new API” from a crate you didn’t even know you depended on! For example:
- A depends on:
- B 1.0 which publicly depends on C ^1.0
- D 1.0, which has no dependencies
- When A is initially built, it resolves to B 1.0 and C 1.1.
- Because C’s APIs are available to A via re-exports in B, A effectively depends on C 1.1 now, even though B only claims to depend on C ^1.0
- In particular, the code in A might depend on APIs only available in C 1.1
- However, if A is a library, we don’t check in any lockfile for it, so this information is lost.
- Now we change A to depend on D 1.1, which depends on C =1.0
- A fresh copy of A, when built, will now resolve the crate graph to B 1.0, D 1.1, C 1.0
- But now A may suddenly fail to compile, because it was implicitly depending on C 1.1 features via B.
This example and others like it rely on a common ingredient: a crate somewhere using an API that only is available in a newer version of a crate than the version listed in Cargo.toml.
To attempt to surface this problem earlier, cargo publish
will attempt to resolve
the graph while picking the smallest versions compatible with constraints. If the
crate fails to build with this resolution graph, the publish will fail.
How We Teach This
From the user’s perspective, the initial scope of the RFC will be quite transparent, but it will definitely show up for users as a question of what the new restrictions mean. In particular, a common way to leak out types from APIs that most crates do is error handling. Quite frequently it happens that users wrap errors from other libraries in their own types. It might make sense to identify common cases of where type leakage happens and provide hints in the lint about how to deal with it.
Cases that I anticipate that should be explained separately:
- Type leakage through errors: This should be easy to spot for a lint because the
wrapper type will implement
std::error::Error
. The recommendation should most likely be to encourage wrapping the internal error. - Traits from other crates: In particular, serde and some other common crates will show up frequently. It might make sense to separately explain types and traits.
- Type leakage through derive: Users might not be aware they have a dependency on
a type when they derive a trait (think
serde_derive
). The lint might want to call this out separately.
The feature will be called public_private_dependencies
and it comes with one
lint flag called external_private_dependency
. For all intents and purposes, this
should be the extent of the new terms introduced in the beginning. This RFC, however,
lays the groundwork for later providing aliasing so that a private dependency could
be forcefully re-exported as the crate’s own types. As such, it might make sense to
consider how to refer to this.
It is assumed that this feature will eventually become quite popular due to patterns that already exist in the crate ecosystem. It’s likely that it will evoke some negative opinions initially. As such, it would be a good idea to make a run with cargobomb/crater to see what the actual impact of the new linter warnings is and how far away we are from making them errors.
Crates.io should be updated to render public and private dependencies separately.
End user experience
Author of a crate with one dependency
Assume today that an author of a library crate onedep
has a
dependency on the url
crate and the url::Url
type is exposed in
onedep
’s public API.
onedep
’s Cargo.toml
:
[package]
name = "onedep"
version = "0.1.0"
[dependencies]
url = "1.0.0"
onedep
’s src/lib.rs
:
extern crate url;
use url::Origin;
use std::collections::HashMap;
#[derive(Default)]
pub struct OriginTracker {
origin_counts: HashMap<Origin, usize>,
}
impl OriginTracker {
pub fn log_origin(&mut self, origin: Origin) {
let counter = self.origin_counts.entry(origin).or_insert(0);
*counter += 1;
}
}
When the author of onedep
upgrades Rust/Cargo to a version where this RFC is
completely implemented, the author will notice two changes:
-
When they run
cargo build
, the build will succeed but they will get a warning that a private dependency (theurl
crate specifically) is used in their public API (theurl::Origin
type in thepub fn log_origin
function specifically) and that they should consider addingpublic = true
to theirCargo.toml
. Ideally the warning would say something like:consider changing dependency: ``` url = "1.0.0" ``` to: ``` url = { version = "1.0.0", public = true } ```
The warning could also encourage the author to then bump their crate’s major version since adding public dependencies is a breaking change.
- When they run
cargo publish
, the build check that happens after packaging will fail and the publish will fail. This is because derivingHash
onurl::Origin
wasn’t added until v1.5.1 of the url crate. The author ofonedep
has been runningcargo update
periodically, and theirCargo.lock
has url 1.5.1, but they never updatedCargo.toml
to indicate that they have a new lower bound. Sincecargo publish
will try to resolve dependencies to the lowest possible versions, it will choose version 1.0.0 of the url crate, which doesn’t implementHash
onOrigin
.
There should be a clear error message for this case that indicates Cargo has
resolved crates to their lowest possible versions, that this might be the cause of
the compilation failure, and that the author should investigate the versions of
their dependencies in Cargo.toml
to see if they should be updated. This command
should change the Cargo.lock so that running cargo build
will reproduce the error
for the author to fix.
Author of a crate with multiple dependencies
twodep
’s Cargo.toml
:
[package]
name = "twodep"
version = "0.1.0"
[dependencies]
// this is the version of onedep above using a public dep on url 1.5.1
onedep = "1.0.0"
url = "1.0.0"
twodep
’s src/main.rs
:
extern crate url;
use url::Origin;
extern crate onedep;
fn main() {
let mut origin_tracker = onedep::OriginTracker::default();
loop {
println!("Please enter a URL!");
// pseudocode because I'm lazy
let url = stdin::readline().unwrap();
let url = Url::parse(url).unwrap();
origin_tracker.log_origin(url.origin());
// other stuff
}
println!("Here are all the origins you mentioned: {:#?}", origin_tracker);
}
Before upgrading Rust/Cargo to a version where this RFC has been implemented, this code might have been getting a compilation error if Cargo had resolved the direct dependency on the url crate to a different version than the version of onedep resolved to. Or it might have been resolving and compiling fine if the versions had resolved to be the same.
After upgrading Rust/Cargo, if this code had a compilation error, it would now have
a version resolution problem that cargo would either automatically resolve or prompt
the user to change version constraints/cargo update
to resolve. If the code was
compiling before, that must mean the previous resolution graph was good, so nothing
will change on upgrading.
This crate is a binary and doesn’t have a public API, so it won’t get any warnings about crates not being marked public.
If the author publishes to crates.io after upgrading Rust/Cargo, since onedep’s public dependency on url now has a lower bound of 1.5.1, the only valid graphs that Cargo will generate will be with url 1.5.1 or greater, which is also compatible with the url 1.0.0 direct dependency. Publish will work without any errors or further changes.
Drawbacks
I believe that there are no drawbacks if implemented well (this assumes good linters and error messages).
Alternatives
For me, the biggest alternative to this RFC would be a variation of it where type
and trait aliasing becomes immediately part of it. This would mean that a crate
can have a private dependency and re-export it as its own type, hiding where it
came from originally. This would most likely be easier to teach users and can get
rid of a few “cul-de-sac” situations users can end up in where their only way
out is to introduce a public dependency for now. The assumption is that if trait
and type aliasing is available, the external_public_dependency
would not need to
exist.
Unresolved questions
There are a few open questions about how to best hook into the compiler and Cargo infrastructure:
- What is the impact of this change going to be? This most likely can be answered running cargobomb/crater.
- Since changing public dependency pins/ranges requires a change in semver, it might be worth exploring if Cargo could prevent the user from publishing new crate versions that violate that constraint.
- If this is implemented before the RFC to deprecate
extern crate
, how would this work if you’re not using--extern
?