Examples of matching Unicode text in regular expressions

The following regex will match accented characters, such as "à":

    ^\p{L}+$

The following regex will match a text consisting of Latin characters and Unicode whitespaces:

    ^[\p{IsLatin}\p{Zs}]+$

The following regex should be used to detect the presence of a Hebrew character in input:

    \p{InHebrew}

The following regex should be used to detect an input that contains only Arabic text:

    ^\p{InArabic}+$

How can we match Urdu text? Since Urdu is not a script, we will need to match certain Unicode code ranges. These are as follows:

    U+0600 to U+06FF    U+0750 to U+077F    U+FB50 to U+FDFF    U+FE70 to U+FEFF

A Java regex to detect the presence of any Urdu character will be:

[\u0600-\u06FF\u0750- ...

Get Java 9 Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.