Commonly used Unicode character properties

Here is the list of commonly used Unicode character properties in regular expressions that require to match Unicode texts:

Unicode character class Meaning
\p{L} Match any letter from any language
\p{Lu} Match any uppercase letter from any language
\p{Ll} Match any lowercase letter from any language
\p{N} Match any digit from any language
\p{P} Match any punctuation letter from any language
\p{Z} Match any kind of whitespace or invisible separator
\p{C} Match any invisible control letter
\p{Sc} Match any currency symbol
\R Any Unicode linebreak sequence; is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
It is recommended to use \R to match any newline ...

Get Java 9 Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.