
Pattern Syntax Characters (Pattern Syntax, Pat Syn)
This set contains characters that are used as operators or separators or in other
special roles in patterns. This set is fixed—i.e., it will not be extended. There are
2,760 characters in it, as defined in the PropList.txt file of the Unicode database.
The ASCII characters in the set are: !"#$%&'()*+,-./:;<>?@[\]^`{|}~.
Pattern Whitespace Characters (Pattern White Space, Pat WS)
This set contains characters treated as whitespace in patterns. Whitespace may be
needed to separate symbols from each other, but it is otherwise insignificant. This
set too is fixed. There are only 11 characters in it: horizontal tab (U+0009), line
feed (U+000A), vertical tab (U+000B), form feed (U+000C), carriage return
(U+000D), space (U+0032), next line (U+0085), left-to-right mark (U+200E),
right-to-left mark (U+200F), line separator (U+2028), and paragraph separator
(U+2029).
The policy that Pattern Syntax Characters and Pattern Whitespace Characters are fixed
(closed) sets does not mean that actual identifier syntax needs to use exactly those sets.
On the contrary, fixing the sets makes it easier to define identifier syntax on a Unicode
basis: it can be defined using the Unicode syntax as an immutable base and adding or
removing characters as desired. Of course, if a specific identifier syntax definition makes
a character such as $ allowed in an identifier, ...