C++ Tokens
A token is the smallest element of a C++ program that is meaningful to the compiler. The C++ parser recognizes these kinds of tokens: identifiers, keywords, literals, operators, punctuators, and other separators. A stream of these tokens makes up a translation unit.
Tokens are usually separated by "white space." White space can be one or more:
Blanks
Horizontal or vertical tabs
New lines
Formfeeds
Comments
The following are considered tokens:
keywordidentifierconstantoperatorpunctuator
The following are considered preprocessing tokens:
header-nameidentifierpp-numbercharacter-constantstring-literaloperatorpunctuator each nonwhite-space character that cannot be one of the above
The parser separates tokens from the input stream by creating the longest token possible using the input characters in a left-to-right scan. Consider this code fragment:
a = i+++j;
The programmer who wrote the code might have intended either of these two statements:
a = i + (++j)
a = (i++) + j
Because the parser creates the longest token possible from the input stream, it chooses the second interpretation, making the tokens i++
, +
, and j
.