
characters may be lost. However, an implementation of Unicode is required to preserve
characters, instead of, for example, dropping out characters that it does not recognize.
It may well fail to display them, but they should be available in the data by other means.
For example, for the expression m², the first two methods just discussed imply that in
cut and paste, the result preserves the information: m². (For method 2, we assume that
you cut from the formatted document, not from XML or HTML source.) For methods
3 and 4, cut and paste normally converts the text to “m2,” unless the operation takes
place inside a program or between programs that recognize the method used. Thus, if
you copy and paste the string “m
2
” where “2” is formatted as a superscript, the for-
matting is preserved when working inside a word processor, but not when copying
from it into a plain text editor like Notepad. When method 5 is used, the data copied
is of course “m2.”
Similarly, when data is read by a program, information expressed at the character level
is always available to the program, though it may not make use of it. Information ex-
pressed in markup is normally available, too, since programs normally read the markup
source, but they would need to recognize the markup—at least to the extent that it can
skip it, instead of treating markup as data! Reading data in a word processor’s internal
format is possible, ...