Skip to Content
Real World XML
book

Real World XML

by Steven Holzner
January 2003
Beginner to intermediate content levelBeginner to intermediate
1200 pages
23h 42m
English
Peachpit Press
Content preview from Real World XML

ASCII, Unicode, and the Universal Character System

The actual characters in documents are stored as numeric codes, and today the most common code set is the American Standard Code for Information Interchange (ASCII). ASCII codes extend from 0 to 127; for example, the ASCII code for A is 65, the ASCII code for B is 66, and so on.

On the other hand, the World Wide Web is just that today—worldwide. And plenty of scripts are not handled by ASCII, including Bengali, Armenian, Hebrew, Thai, Tibetan, Japanese Katakana, Arabic, and Cyrillic.

For that reason, the default character set specified for XML by W3C is Unicode, not ASCII. Unicode codes are made up of 2 bytes, not 1, so they extend from 0 to 65,535 instead of just 0 to 255 (however, to make things ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Inside XML

Inside XML

Steven Holzner
XML Hacks

XML Hacks

Michael Fitzgerald

Publisher Resources

ISBN: 0735712867Purchase book