The XML specification divides Unicode into five overlapping sets:
- Name characters
Characters that can appear in an element, attribute, or entity name. These characters are letters, ideographs, digits, and the punctuation marks
-, ., and :. In the tables that follow, name characters are shown in bold type, such as A, Å, Ą, Д, ئ, 1, 2, 3, α, ℵ, and _.
One of the major differences between XML 1.0 and 1.1 is in which characters are name characters. All XML 1.0 name characters are also XML 1.1 name characters. However, XML 1.1 also promotes many other characters to name characters. Some of these, such as the Burmese and Mongolian letters, reasonably deserve to be name characters. However, XML 1.1 also allows many problematic characters including ligatures such as ij, currency symbols such as the Greek drachma sign, letter-like symbols such as ©, number forms such as Roman numerals, and presentation forms. Finally, it allows all characters not defined as of Unicode 3.1.1 and all characters from beyond the basic multilingual plane, including such strange things as the musical symbol for a six-string fretboard. Unless you are working in a language such as Burmese or Mongolian that requires these new characters, it is recommended that you restrict your markup to characters that are legal in XML 1.0. The tables that follow are based on XML 1.0 rules.
- Name start characters
Characters that can be the first character of an element, attribute, or entity name. These characters are ...