O'Reilly logo

Java 9 Regular Expressions by Anubhava Srivastava

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Examples of matching Unicode text in regular expressions

The following regex will match accented characters, such as "à":

    ^\p{L}+$

The following regex will match a text consisting of Latin characters and Unicode whitespaces:

    ^[\p{IsLatin}\p{Zs}]+$

The following regex should be used to detect the presence of a Hebrew character in input:

    \p{InHebrew}

The following regex should be used to detect an input that contains only Arabic text:

    ^\p{InArabic}+$

How can we match Urdu text? Since Urdu is not a script, we will need to match certain Unicode code ranges. These are as follows:

    U+0600 to U+06FF    U+0750 to U+077F    U+FB50 to U+FDFF    U+FE70 to U+FEFF

A Java regex to detect the presence of any Urdu character will be:

[\u0600-\u06FF\u0750- ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required