Chapter 2. Lexical Elements

Java source code consists of words or symbols called lexical elements, or tokens. Java lexical elements include line terminators, whitespace, comments, keywords, identifiers, separators, operators, and literals. The words or symbols in the Java programming language are comprised of the Unicode character set.

Unicode and ASCII

Maintained by the Unicode Consortium standards organization, Unicode is the universal character set with the first 128 characters the same as those in the American Standard Code for Information Interchange (ASCII) character set. Unicode provides a unique number for each character, usable across all platforms, programs, and languages. Java SE 9 supports Unicode 8.0.0. You can find more information about the Unicode Standard in the online manual. Java SE 8 supports Unicode 6.2.0.


Java comments, identifiers, and string literals are not limited to ASCII characters. All other Java input elements are formed from ASCII characters.

The Unicode set version used by a specified version of the Java platform is documented in the Character class of the Java API. The Unicode Character Code Chart for scripts, symbols, and punctuation can be accessed at

Printable ASCII Characters

ASCII reserves code 32 (spaces) and codes 33–126 (letters, digits, punctuation marks, and a few others) for printable characters. Table 2-1 contains the decimal values followed by the corresponding ASCII characters for these codes.

Get Java Pocket Guide, 4th Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.