Recipe 43Enabling Unicode Features with the u Flag

Task

Suppose you have an online forum and want to limit the characters in user posts to words, numbers, underscores, hyphens, and emoticons. You can impose the first four rules with a character class like [-\w]+. Recall from Recipe 27, Matching One of Several Characters with the Character Class that \w is equivalent to [a-zA-Z0-9_], so you just need to add the hyphen and emoticons.

Matching emoticons is a bit more complicated. In Unicode, emoticons are a block of code points containing 80 Unicode emojis. But adding all these code points to your character class would be a real chore.

You need a solution that enables you to define a range of emojis in the character class, just as you’d define ...

Get Text Processing with JavaScript now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.