Unicode support in Java regular expressions

So far, all the examples that we have seen in the first two chapters are for the English language only. However, a regular expression needs to have full support for all the languages using Unicode characters. Java has a Unicode-based regex engine and has extensive support for various Unicode scripts, blocks, and categories.

A specific Unicode character can be matched in two different ways in Java:

  1. Unicode escape sequence or the \u notation: This can be written as "\u1234" or "\\u1234".
  2. Hex notation: This can be written as "\x{1234}".

Get Java 9 Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.