Unicode Primer

Before the Unicode standard was developed, there were multiple character encoding schemes that were inadequate and that, at times, conflicted with each other. It was nearly impossible to develop global applications that were consistent because no single character encoding scheme could support all characters.

Unicode is a standard for character encoding that resolves these problems. It was developed and is maintained by the Unicode Consortium. The Unicode Standard and Unicode Character Database, or UCD, define what is included in each version.

Oracle’s Unicode character sets allow you to store and retrieve more than 200 different individual character sets. Using a Unicode character set provides support for all character sets without making any engineering changes to an application.

Oracle Database 11g Release 2 supports Unicode version 5.0. First published in 2006, Unicode 5.0 includes the capacity to encode more than 1 million characters. This is enough to support all modern characters, as well as many ancient or minor scripts. At the time of this writing, Unicode 5.1 is the most current published Unicode version.

Unicode character sets in Oracle Database 11g include UTF-8 and UTF-16 encoding. UTF-8 stores characters in 1, 2, or 3 bytes, depending on the character. UTF-16 stores characters in 2 bytes regardless of character. Supplementary characters are supported with both encoding schemes, and these require 4 bytes per character regardless of the Unicode character set ...

Get Oracle PL/SQL Programming, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.