XML Names
The XML specification can be quite legalistic and picky at times. Nonetheless, it tries to be efficient where possible. One way it does that is by reusing the same rules for different items where possible. For example, the rules for XML element names are also the rules for XML attribute names, as well as for the names of several less common constructs. Collectively, these are referred to simply as XML names.
Element and other XML names may contain essentially any alphanumeric character. This includes the standard English letters A through Z and a through z as well as the digits 0 through 9. XML names may also include non-English letters, numbers, and ideograms, such as ö, ç, Ω, 串. They may also include these three punctuation characters:
| _ The underscore |
| - The hyphen |
| . The period |
XML names may not contain other punctuation characters such as quotation marks, apostrophes, dollar signs, carets, percent symbols, and semicolons. The colon is allowed, but its use is reserved for namespaces as discussed in Chapter 4. XML names may not contain whitespace of any kind, whether a space, a carriage return, a line feed, a nonbreaking space, and so forth. Finally, all names beginning with the string “XML” (in any combination of case) are reserved for standardization in W3C XML-related specifications.
Tip
The primary new feature in XML 1.1 is that XML names may contain characters only defined in Unicode 3.0 and later. XML 1.0 is limited to the characters defined as of Unicode 2.0. Additional ...