- Feature Name:
raw_identifiers
- Start Date: 2017-09-14
- RFC PR: rust-lang/rfcs#2151
- Rust Issue: rust-lang/rust#48589
Summary
Add a raw identifier format r#ident
, so crates written in future language
editions/versions can still use an older API that overlaps with new keywords.
Motivation
One of the primary examples of breaking changes in the edition RFC is to add
new keywords, and specifically catch
is the first candidate. However, since
that’s seeking crate compatibility across editions, this would leave a crate in
a newer edition unable to use catch
identifiers in the API of a crate in an
older edition. @matklad found 28 crates using catch
identifiers, some
public.
A raw syntax that’s always an identifier would allow these to remain
compatible, so one can write r#catch
where catch
-as-identifier is needed.
Guide-level explanation
Although some identifiers are reserved by the Rust language as keywords, it is
still possible to write them as raw identifiers using the r#
prefix, like
r#ident
. When written this way, it will always be treated as a plain
identifier equivalent to a bare ident
name, never as a keyword.
For instance, the following is an erroneous use of the match
keyword:
fn match(needle: &str, haystack: &str) -> bool {
haystack.contains(needle)
}
error: expected identifier, found keyword `match`
--> src/lib.rs:1:4
|
1 | fn match(needle: &str, haystack: &str) -> bool {
| ^^^^^
It can instead be written as fn r#match(needle: &str, haystack: &str)
, using
the r#match
raw identifier, and the compiler will accept this as a true
match
function.
Generally when defining items, you should just avoid keywords altogether and
choose a different name. Raw identifiers require the r#
prefix every time
they are mentioned, making them cumbersome to both the developer and users.
Usually an alternate is preferable: crate
-> krate
, const
-> constant
,
etc.
However, new Rust editions may add to the list of reserved keywords, making a formerly legal identifier now interpreted otherwise. Since compatibility is maintained between crates of different editions, this could mean that code written in a new edition might not be able to name an identifier in the API of another crate. Using a raw identifier, it can still be named and used.
//! baseball.rs in edition 2015
pub struct Ball;
pub struct Player;
impl Player {
pub fn throw(&mut self) -> Result<Ball> { ... }
pub fn catch(&mut self, ball: Ball) -> Result<()> { ... }
}
//! main.rs in edition 2018 -- `catch` is now a keyword!
use baseball::*;
fn main() {
let mut player = Player;
let ball = player.throw()?;
player.r#catch(ball)?;
}
Reference-level explanation
The syntax for identifiers allows an optional r#
prefix for a raw identifier,
otherwise following the normal identifier rules. Raw identifiers are always
interpreted as plain identifiers and never as keywords, regardless of context.
They are also treated equivalent to an identifier that wasn’t raw – for
instance, it’s perfectly legal to write:
let foo = 123;
let bar = r#foo * 2;
Drawbacks
- New syntax is always scary/noisy/etc.
- It might not be intuitively “raw” to a user coming upon this the first time.
Rationale and Alternatives
If we don’t have any way to refer to identifiers that were legal in prior
editions, but later became keywords, then this may hurt interoperability
between crates of different editions. The r#ident
syntax enables
interoperability, and will hopefully invoke some intuition of being raw,
similar to raw strings.
The br#ident
syntax is also possible, but I see no advantage over r#ident
.
Identifiers don’t need the same kind of distinction as str
and [u8]
.
A small possible alternative is to also terminate it like r#ident#
, which
could allow non-identifier characters to be part of a raw identifier. This
could take a cue from raw strings and allow repetition for internal #
, like
r##my #1 ident##
. That doesn’t allow a leading #
or "
though.
A different possibility is to use backticks for a string-like `ident`
,
like Kotlin, Scala, and Swift. If it allows non-identifier chars, it
could embrace escapes like \u
, and have a raw-string-identifier r`slash\ident`
and even r#`tick`ident`#
. However, backtick identifiers
are annoying to write in markdown. (e.g. `` `ident` ``
)
Backslashes could connote escaping identifiers, like \ident
, perhaps
surrounded like \ident\
, \{ident}
, etc. However, the infix RFC #1579
currently seems to be leaning towards \op
syntax already.
Alternatives which already start legal tokens, like C#’s @ident
, Dart’s
#ident
, or alternate prefixes like identifier#catch
, all break Macros 1.0
as @kennytm demonstrated:
macro_rules! x {
(@ $a:ident) => {};
(# $a:ident) => {};
($a:ident # $b:ident) => {};
($a:ident) => { should error };
}
x!(@catch);
x!(#catch);
x!(identifier#catch);
x!(keyword#catch);
C# allows Unicode escapes directly in identifiers, which also separates them
from keywords, so both @catch
and cl\u0061ss
are valid class
identifiers.
Java also allows Unicode escapes, but they don’t avoid keywords.
For some new keywords, there may be contextual mitigations. In the case of
catch
, it couldn’t be a fully contextual keyword because catch { ... }
could
be a struct literal. That context might be worked around with a path, like
old_edition::catch { ... }
to use an identifier instead. Contexts that don’t
make sense for a catch
expression can just be identifiers, like foo.catch()
.
However, this might not be possible for all future keywords.
There might also be a need for raw keywords in the other direction, e.g. so the
older edition can still use the new catch
functionality somehow. I think this
particular case is already served well enough by do catch { ... }
, if we
choose to stabilize it that way. Perhaps br#keyword
could be used for this,
but that may not be a good intuitive relationship.
Unresolved questions
- Do macros need any special care with such identifier tokens?
- Should diagnostics use the
r#
syntax when printing identifiers that overlap keywords? - Does rustdoc need to use the
r#
syntax? e.g. to documentpub use old_edition::*