Recipe 18 | Counting Unicode Characters with Intl.Segmenter() |
Task
Let’s say you have an app that requires user registration and includes a text input for the user’s bio. The bio may be in any language and include emojis. You want to limit the length of the bio to precisely 120 characters. So, you need to calculate the length of the string. Easy! Use the length() method to get the number of characters:
This is probably not the result you expected. Strings in JavaScript are based on UTF-16, which represents characters using one or two 16-bit integers. Some characters, such as the Chinese character for kǒu cái (eloquence) and the cat emoji, require ...
Get Text Processing with JavaScript now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.