
Character and String Data
Processing of character data in computers operates on characters represented by code
numbers. This is often expressed by saying that characters are treated as small integers,
though especially when using Unicode, they need not be that small. A string is usually
represented as a sequence of characters in consecutive storage locations. Otherwise,
the representation and handling of characters varies greatly by programming language
and by software modules.
Constructs and Principles of Processing Characters
For the processing of character data, programming language design needs to solve sev-
eral problems, and the solutions greatly affect the suitability of the language to string-
oriented tasks. You are probably not designing a new programming language, but you
may need to select between some existing languages for a project, or to learn or teach
a language. In the latter area, the phenomenon that psychologists call negative trans-
fer is often problematic: when you have learned one way of doing things in a language
(say, the difference between single and double quotes around a literal), you will im-
plicitly assume that another language uses the same way. Even after you have learned
the difference, you keep forgetting it. Therefore, it is useful to make some explicit com-
parisons.
The key features in the processing of character data in a programming language are:
• Repertoire: ...