
CHAPTER 11
Characters in Programming
This chapter
presents a number of ways to represent character and string data in dif-
ferent programming languages, such as FORTRAN, C, C#, Perl, ECMAScript (Java-
Script), and Java, and also other languages such as XML and CSS. It explores both the
differences and similarities, illustrated with sample programs to perform simple ma-
nipulation of string data. The information is presented to introduce you to using Uni-
code in programming in different languages. You will need to study language manuals
and library documentation in order to do some serious programming.
You need to understand some basics of programming to benefit from this chapter. You
should be able to write a program that prints “Hello world,” and you should know how
to declare variables and assign values to them, write expressions and conditional state-
ments, and use subprograms. Here we will discuss the specifics of processing character
and string data. One reason for this is that even people who know programming well
may get confused with the fundamental concepts and cannot distinguish, for example,
between an empty string, a space character, the NUL character, and the digit zero.
Programming language tutorials typically discuss the character concept rather briefly,
often assuming that only ASCII data will be used.
The International Components for Unicode (ICU) activity, based on the open ...