APPENDIX C

Unicode

Introduction

This appendix provides an overview of the Unicode character set, the native character set of the JavaScript language. As Unicode contains over 100,000 characters and continues to grow, we won’t list every character; rather, we will cover the basics of codepoints, blocks, categories, encoding, and decoding. The goal is to provide just enough material for you to use many of the features of the character set effectively.

Basic Concepts

A character is a named symbol, such as LATIN SMALL LETTER C WITH CEDILLA, GURMUKHI SIGN ADAK BINDI, or SOUTH WEST BLACK ARROW. Do not confuse a character with a glyph, which is just a picture of a character. For example, the glyph Σ can be used for GREEK CAPITAL LETTER SIGMA as well ...

Get Programming with JavaScript: Algorithms and Applications for Desktop and Mobile Browsers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.