Skip to Content
XML in a Nutshell, 3rd Edition
book

XML in a Nutshell, 3rd Edition

by Elliotte Rusty Harold, W. Scott Means
September 2004
Intermediate to advanced
712 pages
24h 45m
English
O'Reilly Media, Inc.
Content preview from XML in a Nutshell, 3rd Edition

The Default Character Set for XML Documents

Before an XML parser can read a document, it must know which character set and encoding the document uses. In some cases, external metainformation tells the parser what encoding the document uses. For instance, an HTTP header may include a Content-type header like this:

Content-type: text/html; charset=ISO-8859-1

However, XML parsers generally can’t count on the availability of such information. Even if they can, they can’t necessarily assume that it’s accurate. Therefore, an XML parser will attempt to guess the character set based on the first several bytes of the document. The main checks the parser makes include the following:

  • If the first two bytes of the document are #xFEFF , then the parser recognizes the bytes as the Unicode byte-order mark . It then guesses that the document is written in the big-endian, UTF-16 encoding of Unicode. With that knowledge, it can read the rest of the document.

  • If the first two bytes of the document are #xFFFE, then the parser recognizes the little-endian form of the Unicode byte-order mark. It now knows that the document is written in the little-endian, UTF-16 encoding of Unicode, and with that knowledge it can read the rest of the document.

  • If the first four bytes of the document are #x3C3F786D, that is, the ASCII characters <?xm, then it guesses that the file is written in a superset of ASCII. In particular, it assumes that the file is written in the UTF-8 encoding of Unicode. Even if it’s wrong, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

XML: Visual QuickStart Guide, Second Edition

XML: Visual QuickStart Guide, Second Edition

Kevin Howard Goldberg
XML Hacks

XML Hacks

Michael Fitzgerald

Publisher Resources

ISBN: 0596007647Errata PageSupplemental Content