- Feature Name:
path_bases
- Start Date: 2023-11-13
- RFC PR: rust-lang/rfcs#3529
- Rust Issue: rust-lang/cargo#14355
Summary
Introduce a table of path “bases” in Cargo configuration files that can be used
to prefix the path of path
dependencies and patch
entries.
This feature will not support declaring path bases in manifest files to avoid additional design complexity, though this may be added in the future.
Motivation
As a project grows in size, it becomes necessary to split it into smaller sub-projects, architected into layers with well-defined boundaries.
One way to enforce these boundaries is to use different Git repos (aka
“multi-repo”). Cargo has good support for multi-repo projects using either git
dependencies, or developers can use private registries if they want to
explicitly publish code or need to preprocess their sub-projects (e.g.,
generating code) before they can be consumed.
If all of the code is kept in a single Git repo (aka “mono-repo”), then these
boundaries must be enforced a different way: either leveraging tooling during
the build to check layering, or requiring that sub-projects explicitly publish
and consume from some intermediate directory. Cargo has poor support for
mono-repos: the only viable mechanism is path
dependencies, but these require
relative paths (which makes refactoring and moving sub-projects very difficult)
and don’t work at all if the mono-repo requires publishing and consuming from an
intermediate directory (as this may very per host, or per target being built).
This RFC proposes a mechanism to specify path bases in config.toml
files which
can be used to prepend path
dependencies. This allows mono-repos to specify
dependencies relative to their root directory, which allows the consuming
project to be moved freely (no relative paths to update) and a simple
find-and-replace to handle a producing project being moved. Additionally, a
host-specific or target-specific intermediate directory may be specified as a
base
, allowing code to be consumed from there using path
dependencies.
Example
If we had a sub-project that depends on three others:
foo
which is in a different layer of the mono-repo.bar_with_generated
that must be consumed from an intermediate directory because it contains target-specific generated code.baz
which is in the current layer.
We may have a Cargo.toml
snippet that looks like this:
[dependencies]
foo = { path = "../../../other_layer/foo" }
bar_with_generated = { path = "../../../../intermediates/x86_64/Debug/third_layer/bar_with_generated" }
baz = { path = "../baz" }
This has many issues:
- Moving the current sub-project may require changing all of these relative paths.
bar_with_generated
will only work if we’re building x86_64 Debug.bar_with_generated
assumes that theintermediates
directory is a sibling to our source directory, and not somewhere else completely (e.g., a different drive for performance reasons).- Moving
foo
orbaz
requires searching the code for each possible relative path (e.g.,../../../other_layer/foo
and../foo
) and may be error prone if there is some other sub-project in directory with the same name.
Instead, if we could specify these common paths as path bases in a config.toml
(which may be generated by an external build system which in turn invokes Cargo):
[path-bases]
sources = "/home/user/dev/src"
intermediates = "/home/user/dev/intermediates/x86_64/Debug"
Then the Cargo.toml
can use those path bases and avoid relative paths:
[dependencies]
foo = { path = "other_layer/foo", base = "sources" }
bar_with_generated = { path = "third_layer/bar_with_generated", base = "intermediates" }
baz = { path = "this_layer/baz", base = "sources" }
Which resolves the issues we previously had:
- The current project can be moved without modifying the
Cargo.toml
at all. bar_with_generated
works for all targets (assuming theconfig.toml
is
generated).
- The
intermediates
directory can be placed anywhere. - Moving
foo
orbaz
only requires searching for the canonical form relative to the path base.
Other uses
The ability to use path bases for path
dependencies is convenient for
developers who are using a large number of path
dependencies within the same
root directory. Instead of repeating the same path fragment many times in their
Cargo.toml
, they can instead specify it once in a config.toml
as a path
base, then use that path base in each of their path
dependencies.
Cargo will also provide built-in base paths, for example workspace
to point to
the root directory of the workspace. This allows workspace members to reference
each other without first needing to ../
their way back to the workspace root.
Guide-level explanation
If you often use multiple path dependencies that have a common parent directory,
or if you want to avoid putting long paths in your Cargo.toml
, you can
define path base directories in your
configuration.
Your path dependencies can then be specified relative to those base
directories.
For example, say you have a number of projects checked out in
/home/user/dev/rust/libraries/
. Rather than use that path in your
Cargo.toml
files, you can define it as a “base” path in
~/.cargo/config.toml
:
[path-bases]
dev = "/home/user/dev/rust/libraries/"
Now, you can specify a path dependency on a library foo
in that
directory in your Cargo.toml
using
[dependencies]
foo = { path = "foo", base = "dev" }
Like with other path dependencies, keep in mind that both the base and
the path must exist on any other host where you want to use the same
Cargo.toml
to build your project.
You can also use base
along with path
when specifying a [patch]
.
Specifying a path
and base
on a [patch]
is equivalent to specifying just a
path
containing the full path including the prepended base.
Reference-level explanation
Specifying Dependencies
Path Bases
A path
dependency may optionally specify a base by setting the base
key to
the name of a path base from the [path-bases]
table in either the
configuration
or one of the built-in path bases. The value of that
path base is prepended to the path
value (along with a path separator if
necessary) to produce the actual location where Cargo will look for the
dependency.
For example, if the Cargo.toml
contains:
[dependencies]
foo = { path = "foo", base = "dev" }
Given a [path-bases]
table in the configuration that contains:
[path-bases]
dev = "/home/user/dev/rust/libraries/"
This will produce a path
dependency foo
located at
/home/user/dev/rust/libraries/foo
.
Path bases can be either absolute or relative. Relative path bases are relative to the parent directory of the configuration file that declared that path base.
The name of a path base must use only alphanumeric
characters or -
or _
, must start with an alphabetic
character, and must not be empty.
If the name of path base used in a dependency is neither in the configuration nor one of the built-in path base, then Cargo will raise an error.
Built-in path base
Cargo provides implicit path bases that can be used without the need to specify
them in a [path-bases]
table.
workspace
- If a project is a workspace or workspace member then this path base is defined as the parent directory of the rootCargo.toml
of the workspace.
If a built-in path base name is also declared in the configuration, then Cargo will prefer the value in the configuration. The allows Cargo to add new built-in path bases without compatibility issues (as existing uses will shadow the built-in name).
Configuration
[path-bases]
- Type: string
- Default: see below
- Environment:
CARGO_PATH_BASES_<name>
The [path-bases]
table defines a set of path prefixes that can be used to
prepend the locations of path
dependencies. See the
specifying dependencies
documentation for more information.
cargo add
Synopsis
cargo add
[options] --path
path [--base
base]
Options
Source options
--base
base
The path base to use when adding from a local crate.
Workspaces
Path bases can be used in a workspace’s [dependencies]
table.
If a member is inheriting a dependency (i.e., using workspace = true
) then the
base
key cannot also be specified for that dependency in the member manifest.
That is, the member will use the path
dependency as specified in the workspace
manifest and has no ability to override the base path being used (if any).
Drawbacks
-
There is now an additional way to specify a dependency in
Cargo.toml
that may not be accessible when others try to build the same project. Specifically, it may now be that the other host has apath
dependency available at the same relative path toCargo.toml
as the author of theCargo.toml
entry, but does not have the path base defined (or has it defined as some other value).At the same time, this might make path dependencies more re-usable across hosts, since developers can dictate only which bases need to exist, rather than which paths need to exist. This would allow different developers to host their path dependencies in different locations from the original author.
-
Developers still need to know the path within each path base. We could instead define path “aliases”, though at that point the whole thing looks more like a special kind of “local path registry”.
-
This introduces yet another mechanism for grouping local dependencies. We already have local registries, directory registries, and the
[paths]
override. However, those are all intended for immutable local copies of dependencies where versioning is enforced, rather than as mutable path dependencies.
Rationale and alternatives
This design was primarily chosen for its simplicity — it adds very little to what we have today both in terms of API surface and mechanism. But, other approaches exist.
Developers could have their path
dependencies point to symlinks in the
current directory, which other developers would then be told to set up
to point to the appropriate place on their system. This approach has two
main drawbacks: they are harder to use on Windows as they require
special privileges,
and they pollute the user’s project directory.
For the build-system case, the build system could place vendored
dependencies directly into the source directory at well-known locations,
though this would mean that if the source of those dependencies were to
change, the user would have to re-run the build system (rather than just
run cargo
) to refresh the vendored dependency. And this approach too
would end up polluting the user’s source directory.
An earlier iteration of the design avoided adding a new field to
dependencies, and instead inlined the base name into the path using
path = "base::relative/path"
. This has the advantage of not
introducing another special keyword in Cargo.toml
, but comes at the
cost of making ::
illegal in paths, which was deemed too great.
Alternatively, we could add support for extrapolating environment
variables (or arbitrary configuration values?) in Cargo.toml
values.
That way, the path could be given as path = "${base.name}/relative/path"
. While that works, it’s not trivially
backwards compatible, may be confusing when users try to extrapolate
random other configuration variables in their paths, and seems like a
possible Pandora’s box of corner-cases.
The [paths]
feature
could be updated to lift its current limitations around adding
dependencies and requiring that the dependencies be available on
crates.io. This would allow users to avoid path
dependencies in more
cases, but makes the replacement more implicit than explicit. That
change is also more likely to break existing users, and to involve
significant refactoring of the existing mechanism.
We could add another type of local registry that is explicitly declared
in Cargo.toml
, and from which local dependencies could then be drawn.
Something like:
[registry.local]
path = "/path/to/path/registry"
This would make specifying the dependencies somewhat nicer (version = "1", registry = "local"
), and would ensure a standard layout for the
locations of the local dependencies. However, using local dependencies
in this manner would require more set-up to arrange for the right
registry layout, and we would be introducing what is effectively a
mutable registry, which Cargo has avoided thus far.
Even with such an approach, there are benefits to being able to not put
complex paths into Cargo.toml
as they may differ on other build hosts.
So, a mechanism for indirecting through a path name may still be
desirable.
Ultimately, by not having a mechanism to name paths that lives outside
of Cargo.toml
, we are forcing developers to coordinate their file
system layouts without giving them a mechanism for doing so. Or to work
around the lack of a mechanism by requiring developers to add symlinks
in strategic locations, cluttering their directories. The proposed
mechanism is simple to understand and to use, and still covers a wide
variety of use-cases.
Support for declaring path bases in the manifest
Currently path bases only support being declared in the configuration, and not the manifest. While it would be possible to add support for declaring path bases in the manifest in the future (which would require specifying if the declaration in the manifest or configuration is prefered, and how workspace versus members declarations work), it is hard to justify the additional complexity of adding of adding this capability to the initial implementation of the feature.
An argument could be made that specifying path bases in the manifest is a convenience feature, allowing a common path where multiple local dependencies exist to be specified as a path base so that the individual path dependencies would be shorter. However, it would be just as easy to add a configuration file to some parent directory of the dependent and this would be more useful as it is likely that those dependencies will also be used in other local packages thus saving the path bases table being duplicated in multiple manifests.
It could also be argued that specifying path bases in the manifest would be a
way to set “default values” for path dependencies (e.g., to a submodule) that a
developer could override in their local configuration file. While this may be
useful, this scenario is already taken care of by the patch
feature in Cargo.
Prior art
Python searches for dependencies by walking sys.path
in definition
order, which is pulled
from
the current directory, PYTHONPATH
, and a list of system-wide library
directories. All imports are thus “relative” to every directory in
sys.path
. This makes it easy to inject local development dependencies
simply by injecting a path early in sys.path
. The path dependency is
never made explicit anywhere in Python. We could adopt a similar
approach by declaring an environment variable CARGO_PATHS
, where every
path
is considered relative to each path in CARGO_PATHS
until a path
that exists is found. However, this introduces additional possibilities
for user confusion if, say, foo
exists in multiple paths in
CARGO_PATHS
and the first one is picked (though maybe that could be a
warning?).
NodeJS (with npm) is very similar to Python, except that dependencies
can also be
specified
using relative paths like Cargo’s path
dependencies. For non-path
dependencies, it searches in node_modules/
in every parent
directory,
as well as in the NODE_PATH
search
path.
There does not exist a standard mechanism to specify a path dependency
relative to a path named elsewhere. With CommonJS modules, JavaScript
developers are able to extrapolate variables directly into their
require
arguments, and can thus implement custom schemes for getting
customizable paths.
Ruby’s Gemfile
path
dependencies are only ever
absolute paths or paths relative to the Gemfile
’s location, and so are
similar to Rust’s current path
dependencies.
The same is the case for Go’s go.mod
replacement
dependencies,
which only allow absolute or relative paths.
From this, it’s clear that other major languages do not have a feature quite like this. This is likely because path dependencies are assumed to be short-lived and local, and thus having them be host-specific is often good enough. However, as the motivation section of this RFC outlines, there are still use-cases where a simple name-indirection could help.
Unresolved questions
- What exact names we should use for the table (
path-bases
) and field names (base
)? - What other built-in base paths could be useful?
package
orcurrent-dir
for the directory of the current project?home
oruser_home
for the user’s home directory?sysroot
for the current rustc sysroot?
Future possibilities
Add support for declaring path bases in the manifest
As mentioned above, declaring path bases is only supported in the configuration.
Support could be added to declare path bases in the manifest, but the following design questions need to be answered:
- Is
[path-bases]
a package or a workspace field? - If it is a package field, would it support workspace inheritance? Or would we introduce a new mechanism (e.g., one version of the RFC introduced a “search order” such that Cargo would search for a path base in the package manifest, then the workspace manifest, then the configuration and finally the built-in list).
- Would a relative path base in the workspace manifest be relative to that manifest, or to the package that uses it?
- If using inheritance, should path bases be implicitly or explicitly inherited?
(e.g., requiring
[base-paths] workspace = true
)
Path bases relative to other path bases
We could allow defining a path base relative to another path base:
[path-bases]
base1 = "/dev/me"
base2 = { base = "base1", path = "some_subdir" } # /dev/me/some_subdir
Path dependency with just a base
We could allow defining a path dependency with just base
, making
cratename = { base = "thebase" }
equivalent to
cratename = { base = "thebase", path = "cratename" }
. This would simplify many
common cases, where crates appear within the base in a directory named for the
crate.
Git dependencies
It seems reasonable to extend path bases to git
dependencies, with something
like:
[path-bases]
gh = "https://github.com/jonhoo"
[dependency]
foo = { git = "foo.git", base = "gh" }
However, this may get complicated if someone specifies git
, path
, and
base
.