2. Lexical Elements¶
Note
The contents of this section are informational.
2:1 The text of a Rust program consists of modules organized into source files. The text of a source file is a sequence of lexical elements, each composed of characters, whose rules are presented in this chapter.
2.1. Character Set¶
2.1:1 The program text of a Rust program is written using the Unicode character set.
Syntax
2.1:2 A character is defined by this document for each cell in the coding space described by Unicode, regardless of whether or not Unicode allocates a character to that cell.
2.1:3 A whitespace character is one of the following characters:
2.1:4 0x09 (horizontal tabulation)
2.1:5 0x0A (new line)
2.1:6 0x0B (vertical tabulation)
2.1:7 0x0C (form feed)
2.1:8 0x0D (carriage return)
2.1:9 0x20 (space)
2.1:10 0x85 (next line)
2.1:11 0x200E (left-to-right mark)
2.1:12 0x200F (right-to-left mark)
2.1:13 0x2028 (line separator)
2.1:14 0x2029 (paragraph separator)
2.1:15 A whitespace string is a string that consists of one or more whitespace characters.
2.1:16
An AsciiCharacter is any Unicode character in the range 0x00 - 0x7F, both inclusive.
Legality Rules
2.1:17 The coded representation of a character is tool-defined.
2.2. Lexical Elements, Separators, and Punctuation¶
Syntax
LexicalElement::=Comment|Identifier|Keyword|Literal|PunctuationPunctuation::=Delimiter| + | - | * | / | % | ^ | ! | & | | | && | || | << | >> | += | -= | *= | /= | %= | ^= | &= | |= | <<= | >>= | = | == | != | > | < | >= | <= | @ | _ | . | .. | ... | ..= | , | ; | : | :: | -> | => | # | $ | ?Delimiter::= { | } | [ | ] | ( | )
Legality Rules
2.2:1 The text of a source file is a sequence of separate lexical elements. The meaning of a program depends only on the particular sequence of lexical elements, excluding non-doc comments.
2.2:2 A lexical element is the most basic syntactic element in program text.
2.2:3 The text of a source file is divided into lines.
2.2:4 A line is a sequence of zero or more characters followed by an end of line.
2.2:5 The representation of an end of line is tool-defined.
2.2:6 A separator is a character or a string that separates adjacent lexical elements. A whitespace string is a separator.
2.2:7 A simple punctuator is one of the following special characters:
+
-
*
/
%
^
!
&
|
=
>
<
@
_
.
,
;
:
#
$
?
{
}
[
]
(
)
2.2:8 A compound punctuator is one of the following two or more adjacent special characters:
&& || << >> += -= *= /= %= ^= &= |= <<= >>= == != >= <= .. ... ..= :: -> =>
2.2:9 The following compound punctuators are flexible compound punctuators.
&& || << >>
2.2:10 A flexible compound punctuator may be treated as a single compound punctuator or two adjacent simple punctuators.
2.2:11 Each of the special characters listed for single character punctuator is a simple punctuator except if this character is used as a character of a compound punctuator, or a character of a character literal, a comment, a numeric literal, or a string literal.
2.2:12 The following names are used when referring to punctuators:
2.2:13 |
punctuator |
name |
2.2:14 |
|
Plus |
2.2:15 |
|
Minus |
2.2:16 |
|
Star |
2.2:17 |
|
Slash |
2.2:18 |
|
Percent |
2.2:19 |
|
Caret |
2.2:20 |
|
Not |
2.2:21 |
|
And |
2.2:22 |
|
Or |
2.2:23 |
|
And and, lazy boolean and |
2.2:24 |
|
Or or, lazy boolean or |
2.2:25 |
|
Shift left |
2.2:26 |
|
Shift right |
2.2:27 |
|
Plus equals |
2.2:28 |
|
Minus equals |
2.2:29 |
|
Star equals |
2.2:30 |
|
Slash equals |
2.2:31 |
|
Percent equals |
2.2:32 |
|
Caret equals |
2.2:33 |
|
And equals |
2.2:34 |
|
Or equals |
2.2:35 |
|
Shift left equals |
2.2:36 |
|
Shift right equals |
2.2:37 |
|
Equals |
2.2:38 |
|
Equals equals, logical equality |
2.2:39 |
|
Not equals |
2.2:40 |
|
Greater than |
2.2:41 |
|
Less than |
2.2:42 |
|
Greater than equals, greater than or equal to |
2.2:43 |
|
Less than equals, less than or equal to |
2.2:44 |
|
At |
2.2:45 |
|
Underscore |
2.2:46 |
|
Dot |
2.2:47 |
|
Dot dot, exclusive range |
2.2:48 |
|
Dot dot dot, ellipsis |
2.2:49 |
|
Dot dot equals, inclusive range |
2.2:50 |
|
Comma |
2.2:51 |
|
Semicolon |
2.2:52 |
|
Colon |
2.2:53 |
|
Colon colon, path separator |
2.2:54 |
|
Right arrow |
2.2:55 |
|
Fat arrow, Hashrocket |
2.2:56 |
|
Pound |
2.2:57 |
|
Dollar sign |
2.2:58 |
|
Question mark |
2.2:59 |
|
Left curly brace |
2.2:60 |
|
Right curly brace |
2.2:61 |
|
Left square bracket |
2.2:62 |
|
Right square bracket |
2.2:63 |
|
Left parenthesis |
2.2:64 |
|
Right parenthesis |
2.3. Identifiers¶
Syntax
Identifier::=NonKeywordIdentifier|RawIdentifierIdentifierList::=Identifier(,Identifier)* ,?NonKeywordIdentifier::=PureIdentifier|WeakKeywordRawIdentifier::= r# (PureIdentifier|RawIdentifierKeyword)PureIdentifier::=XID_StartXID_Continue* | _XID_Continue+IdentifierOrUnderscore::=Identifier| _Renaming::= asIdentifierOrUnderscore
2.3:1
A RawIdentifierKeyword is any keyword in category Keyword,
except crate, self, Self, and super.
2.3:2
XID_Start and XID_Continue are defined in Unicode Standard Annex
#31.
Legality Rules
2.3:3 An identifier is a lexical element that refers to a name.
2.3:4 A pure identifier is an identifier that does not include weak keywords.
2.3:5 A pure identifier shall follow the specification in Unicode Standard Annex #31 for Unicode version 16.0, with the following profile:
2.3:6
Start=XID_Start, plus character 0x5F (low line).2.3:7
Continue=XID_Continue2.3:8
Medial= empty
2.3:9 Characters 0x200C (zero width non-joiner) and 0x200D (zero width joiner) shall not appear in a pure identifier.
2.3:10
A pure identifier shall be restricted to characters in category
AsciiCharacter in the following contexts:
2.3:11 Crate imports,
2.3:12 Names of external crates represented in a simple path, when the simple path starts with namespace qualifier
::,2.3:13 Names of outline modules that lack attribute
path,2.3:14 Names of items that are subject to attribute
no_mangle,2.3:15 Names of items within external blocks.
2.3:16 Identifiers are normalized using Normalization Form C as defined in Unicode Standard Annex #15.
2.3:17 Two identifiers are considered the same if they consist of the same sequence of characters after performing normalization.
2.3:18 Declarative macros and procedural macros shall receive normalized identifiers in their input.
Examples
foo
_identifier
r#true
Москва
東京
2.4. Literals¶
Syntax
Literal::=BooleanLiteral|ByteLiteral|ByteStringLiteral|CStringLiteral|CharacterLiteral|NumericLiteral|StringLiteral
Legality Rules
2.4:1 A literal is a fixed value in program text.
2.4.1. Byte Literals¶
Syntax
ByteLiteral::= b'ByteContent'ByteContent::=ByteCharacter|ByteEscapeByteEscape::= \0 | \" | \' | \t | \n | \r | \\ | \xOctalDigitHexadecimalDigit
2.4.1:1
A ByteCharacter is any character in category AsciiCharacter
except characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage
return), 0x27 (apostrophe), and 0x5C (reverse solidus).
Legality Rules
2.4.1:2 A byte literal is a literal that denotes a fixed byte value.
2.4.1:3
The type of a byte literal is u8.
Examples
b'h'
b'\n'
b'\x1B'
2.4.2. Byte String Literals¶
Syntax
ByteStringLiteral::=RawByteStringLiteral|SimpleByteStringLiteral
Legality Rules
2.4.2:1
A byte string literal is a literal that consists of multiple
AsciiCharacters.
2.4.2:2 The character sequence 0x0D 0x0A (carriage return, new line) is replaced by 0x0A (new line) inside of a byte string literal.
2.4.2.1. Simple Byte String Literals¶
Syntax
SimpleByteStringLiteral::= b"SimpleByteStringContent* "SimpleByteStringContent::=ByteEscape|SimpleByteStringCharacter|StringContinuation
2.4.2.1:1
A SimpleByteStringCharacter is any character in category AsciiCharacter
except characters 0x0D (carriage return), 0x22 (quotation mark), and 0x5C
(reverse solidus).
Legality Rules
2.4.2.1:2
A simple byte string literal is a byte string literal that consists of multiple
AsciiCharacters.
2.4.2.1:3
The type of a simple byte string literal of size N is &'static [u8;
N].
Examples
b""
b"a\tb"
b"Multi\
line"
2.4.2.2. Raw Byte String Literals¶
Syntax
RawByteStringLiteral::= brRawByteStringContentRawByteStringContent::=NestedRawByteStringContent| "AsciiCharacter* "NestedRawByteStringContent::= #RawByteStringContent#
Legality Rules
2.4.2.2:1 A raw byte string literal is a simple byte string literal that does not recognize escaped characters.
2.4.2.2:2
The type of a raw byte string literal of size N is &'static
[u8; N].
Examples
br""
br#""#
br##"left #"# right"##
2.4.3. C String Literals¶
Syntax
CStringLiteral::=RawCStringLiteral|SimpleCStringLiteral
Legality Rules
2.4.3:1 A c string literal is a literal that consists of multiple characters with an implicit 0x00 byte appended to it.
2.4.3:2 The character sequence 0x0D 0x0A (carriage return, new line) is replaced by 0x0A (new line) inside of a c string literal.
2.4.3.1. Simple C String Literals¶
Syntax
SimpleCStringLiteral::= c"SimpleCStringContent* "SimpleCStringContent::=AsciiEscape|SimpleStringCharacter|StringContinuation|UnicodeEscape
2.4.3.1:1 A simple c string literal is any Unicode character except characters 0x0D (carriage return), 0x22 (quotation mark), 0x5C (reverse solidus) and 0x00 (null byte).
Legality Rules
2.4.3.1:2 A simple c string literal is a c string literal where the characters are Unicode characters.
2.4.3.1:3
The type of a simple string literal is &'static
core::ffi::CStr.
Examples
c""
c"cat"
c"\tcol\nrow"
c"bell\x07"
c"\u{B80a}"
c"\
multi\
line\
string"
2.4.3.2. Raw C String Literals¶
Syntax
RawCStringLiteral::= crRawCStringContentRawCStringContent::=NestedRawCStringContent| " ~[\r]* "NestedRawCStringContent::= #RawCStringContent#
Legality Rules
2.4.3.2:1 A raw c string literal is a simple c string literal that does not recognize escaped characters.
2.4.3.2:2
The type of a simple string literal is &'static
core::ffi::CStr.
Examples
cr""
cr#""#
cr##"left #"# right"##
2.4.4. Numeric Literals¶
Syntax
NumericLiteral::=FloatLiteral|IntegerLiteral
Legality Rules
2.4.4:1 A numeric literal is a literal that denotes a number.
2.4.4.1. Integer Literals¶
Syntax
IntegerLiteral::=IntegerContentIntegerSuffix?IntegerContent::=BinaryLiteral|DecimalLiteral|HexadecimalLiteral|OctalLiteralBinaryLiteral::= 0bBinaryDigitOrUnderscore*BinaryDigitBinaryDigitOrUnderscore*BinaryDigitOrUnderscore::=BinaryDigit| _BinaryDigit::= [0-1]DecimalLiteral::=DecimalDigitDecimalDigitOrUnderscore*DecimalDigitOrUnderscore::=DecimalDigit| _DecimalDigit::= [0-9]HexadecimalLiteral::= 0xHexadecimalDigitOrUnderscore*HexadecimalDigitHexadecimalDigitOrUnderscore*HexadecimalDigitOrUnderscore::=HexadecimalDigit| _HexadecimalDigit::= [0-9 a-f A-F]OctalLiteral::= 0oOctalDigitOrUnderscore*OctalDigitOctalDigitOrUnderscore*OctalDigitOrUnderscore::=OctalDigit| _OctalDigit::= [0-7]IntegerSuffix::=SignedIntegerSuffix|UnsignedIntegerSuffixSignedIntegerSuffix::= i8 | i16 | i32 | i64 | i128 | isizeUnsignedIntegerSuffix::= u8 | u16 | u32 | u64 | u128 | usize
Legality Rules
2.4.4.1:1 An integer literal is a numeric literal that denotes a whole number.
2.4.4.1:2 A binary literal is an integer literal in base 2.
2.4.4.1:3 A decimal literal is an integer literal in base 10.
2.4.4.1:4 A hexadecimal literal is an integer literal in base 16.
2.4.4.1:5 An octal literal is an integer literal in base 8.
2.4.4.1:6 An integer suffix is a component of an integer literal that specifies an explicit integer type.
2.4.4.1:7 A suffixed integer is an integer literal with an integer suffix.
2.4.4.1:8 An unsuffixed integer is an integer literal without an integer suffix.
2.4.4.1:9 The type of a suffixed integer is determined by its integer suffix as follows:
2.4.4.1:10 Suffix
i8specifies typei8.2.4.4.1:11 Suffix
i16specifies typei16.2.4.4.1:12 Suffix
i32specifies typei32.2.4.4.1:13 Suffix
i64specifies typei64.2.4.4.1:14 Suffix
i128specifies typei128.2.4.4.1:15 Suffix
isizespecifies typeisize.2.4.4.1:16 Suffix
u8specifies typeu8.2.4.4.1:17 Suffix
u16specifies typeu16.2.4.4.1:18 Suffix
u32specifies typeu32.2.4.4.1:19 Suffix
u64specifies typeu64.2.4.4.1:20 Suffix
u128specifies typeu128.2.4.4.1:21 Suffix
usizespecifies typeusize.
2.4.4.1:22 The type of an unsuffixed integer is determined by type inference as follows:
2.4.4.1:23 If an integer type can be uniquely determined from the surrounding program context, then the unsuffixed integer has that type.
2.4.4.1:24 If the program context under-constrains the type, then the inferred type is
i32.2.4.4.1:25 If the program context over-constrains the type, then this is considered a static error.
Examples
0b0010_1110_u8
1___2_3
0x4D8a
0o77_52i128
2.4.4.2. Float Literals¶
Syntax
FloatLiteral::=DecimalLiteral. |DecimalLiteralFloatExponent|DecimalLiteral.DecimalLiteralFloatExponent? |DecimalLiteral(.DecimalLiteral)?FloatExponent?FloatSuffixFloatExponent::=ExponentLetterExponentSign?ExponentMagnitudeExponentLetter::= e | EExponentSign::= + | -ExponentMagnitude::=DecimalDigitOrUnderscore*DecimalDigitDecimalDigitOrUnderscore*FloatSuffix::= f32 | f64
Legality Rules
2.4.4.2:1 A float literal is a numeric literal that denotes a fractional number.
2.4.4.2:2 A float suffix is a component of a float literal that specifies an explicit floating-point type.
2.4.4.2:3 A suffixed float is a float literal with a float suffix.
2.4.4.2:4 An unsuffixed float is a float literal without a float suffix.
2.4.4.2:5 The type of a suffixed float is determined by the float suffix as follows:
2.4.4.2:8 The type of an unsuffixed float is determined by type inference as follows:
2.4.4.2:9 If a floating-point type can be uniquely determined from the surrounding program context, then the unsuffixed float has that type.
2.4.4.2:10 If the program context under-constrains the type, then the inferred type is
f64.2.4.4.2:11 If the program context over-constrains the type, then this is considered a static error.
Examples
45.
8E+1_820
3.14e5
8_031.4_e-12f64
2.4.5. Character Literals¶
Syntax
CharacterLiteral::= 'CharacterContent'CharacterContent::=AsciiEscape|CharacterLiteralCharacter|UnicodeEscapeAsciiEscape::= \0 | \" | \' | \t | \n | \r | \\ | \xOctalDigitHexadecimalDigit
2.4.5:1
A CharacterLiteralCharacter is any Unicode character except
characters 0x09 (horizontal tabulation), 0x0A (new line), 0x0D (carriage
return), 0x27 (apostrophe), and 0x5c (reverse solidus).
2.4.5:2
A UnicodeEscape starts with a \u{ literal, followed by 1 to 6
instances of a HexadecimalDigit, inclusive, followed by a } character.
It can represent any Unicode codepoint between U+00000 and U+10FFFF,
inclusive, except Unicode surrogate codepoints, which exist between
the range of U+D800 and U+DFFF, inclusive.
Legality Rules
2.4.5:3 A character literal is a literal that denotes a fixed Unicode character.
2.4.5:4
The type of a character literal is char.
Examples
'a'
'\t'
'\x1b'
'\u{1F30}'
2.4.6. String Literals¶
Syntax
StringLiteral::=RawStringLiteral|SimpleStringLiteral
Legality Rules
2.4.6:1 A string literal is a literal that consists of multiple characters.
2.4.6:2 The character sequence 0x0D 0x0A (carriage return, new line) is replaced by 0x0A (new line) inside of a string literal.
2.4.6.1. Simple String Literals¶
Syntax
SimpleStringLiteral::= "SimpleStringContent* "SimpleStringContent::=AsciiEscape|SimpleStringCharacter|StringContinuation|UnicodeEscape
2.4.6.1:1
A SimpleStringCharacter is any Unicode character except characters
0x0D (carriage return), 0x22 (quotation mark), and 0x5C (reverse solidus).
2.4.6.1:2
StringContinuation is the character sequence 0x5C 0x0A (reverse solidus,
new line).
Legality Rules
2.4.6.1:3 A simple string literal is a string literal where the characters are Unicode characters.
2.4.6.1:4
The type of a simple string literal is &'static str.
Examples
""
"cat"
"\tcol\nrow"
"bell\x07"
"\u{B80a}"
"\
multi\
line\
string"
2.4.6.2. Raw String Literals¶
Syntax
RawStringLiteral::= rRawStringContentRawStringContent::=NestedRawStringContent| " ~[\r]* "NestedRawStringContent::= #RawStringContent#
Legality Rules
2.4.6.2:1 A raw string literal is a simple string literal that does not recognize escaped characters.
2.4.6.2:2
The type of a raw string literal is &'static str.
Examples
r""
r#""#
r##"left #"# right"##
2.4.7. Boolean Literals¶
Syntax
BooleanLiteral ::=
false
| true
Legality Rules
2.4.7:1 A boolean literal is a literal that denotes the truth values of logic and Boolean algebra.
2.4.7:2
The type of a boolean literal is bool.
Examples
true
2.6. Keywords¶
Syntax
Keyword::=ReservedKeyword|StrictKeyword|WeakKeyword
Legality Rules
2.6:1 A keyword is a word in program text that has special meaning.
2.6:2 Keywords are case sensitive.
2.6.1. Strict Keywords¶
Syntax
StrictKeyword ::=
as
| async
| await
| break
| const
| continue
| crate
| dyn
| else
| enum
| extern
| false
| fn
| for
| if
| impl
| in
| let
| loop
| match
| mod
| move
| mut
| pub
| ref
| return
| self
| Self
| static
| struct
| super
| trait
| true
| type
| unsafe
| use
| where
| while
Legality Rules
2.6.1:1 A strict keyword is a keyword that always holds its special meaning.
2.6.2. Reserved Keywords¶
Syntax
ReservedKeyword ::=
abstract
| become
| box
| do
| final
| macro
| override
| priv
| try
| typeof
| unsized
| virtual
| yield
Legality Rules
2.6.2:1 A reserved keyword is a keyword that is not yet in use.
2.6.3. Weak Keywords¶
Syntax
WeakKeyword ::=
macro_rules
| 'static
| union
| safe
Legality Rules
2.6.3:1 A weak keyword is a keyword whose special meaning depends on the context.
2.6.3:2
Word macro_rules acts as a keyword only when used in the context of a
MacroRulesDefinition.
2.6.3:3
Word 'static acts as a keyword only when used in the context of a
LifetimeIndication.
2.6.3:4
Word union acts as a keyword only when used in the context of a
UnionDeclaration.
2.6.3:5
Word safe acts as a keyword only when used as a qualifier of FunctionDeclaration or StaticDeclaration in the context of a ExternalBlock.
2.5. Comments¶
Syntax
Legality Rules
2.5:1 A comment is a lexical element that acts as an annotation or an explanation in program text.
2.5:2 A block comment is a comment that spans one or more lines.
2.5:3 A line comment is a comment that spans exactly one line.
2.5:4 An inner block doc is a block comment that applies to an enclosing non-comment construct.
2.5:5 An inner line doc is a line comment that applies to an enclosing non-comment construct.
2.5:6 An inner doc comment is either an inner block doc or an inner line doc.
2.5:7 An outer block doc is a block comment that applies to a subsequent non-comment construct.
2.5:8 An outer line doc is a line comment that applies to a subsequent non-comment construct.
2.5:9 An outer doc comment is either an outer block doc or an outer line doc.
2.5:10 A doc comment is a comment class that includes inner block docs, inner line docs, outer block docs, and outer line docs.
2.5:11 Character 0x0D (carriage return) shall not appear in a comment.
2.5:12 Block comments, inner block docs, and outer block docs shall extend one or more lines.
2.5:13 Line comments, inner line docs, and outer line docs shall extend exactly one line.
2.5:14 Outer block docs and outer line docs shall apply to a subsequent non-comment construct.
2.5:15 Inner block docs and inner line docs shall apply to an enclosing non-comment construct.
2.5:16 Inner block docs and inner line docs are equivalent to attribute
docof the form#![doc = content], wherecontentis a string literal form of the comment without the leading//!,/*!amd trailing*/characters.2.5:17 Outer block docs and outer line docs are equivalent to attribute
docof the form#[doc = content], wherecontentis a string literal form of the comment without the leading///,/**and trailing*/characters.Examples