Character Sets

ANSI C defines two character sets. The first is the source character set , which is the set of characters that may be used in a source file. The second is the execution character set , which consists of all the characters that are interpreted during the execution of the program, such as the characters in a string constant.

Each of these character sets contains a basic character set , which includes the following:

  • The 52 upper- and lower-case letters of the Latin alphabet:

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    a b c d e f g h i j k l m n o p q r s t u v w x y z
  • The ten decimal digits (where the value of each character after 0 is one greater than the previous digit):

    0  1  2  3  4  5  6  7  8  9
  • The following 29 graphic characters:

    !  "  #  %  &  '  (  )  *  +  ,  -  .  /  :  ;
    <  =  >  ?  [  \  ]  ^  _  {  |  }  ~
  • The five whitespace characters:

    space, horizontal tab, vertical tab, newline, form feed

In addition, the basic execution character set contains the following:

  • The null character \0, which terminates a character string

  • The control characters represented by simple escape sequences , shown in Table 1-1, for controlling output devices such as terminals or printers

Table 1-1. The standard escape sequences

Escape sequence

Action ondisplay device

Escape sequence

Action ondisplay device

\a

Alert (beep)

\'

The character '

\b

Backspace

\"

The character "

\f

Form feed

\?

The character ?

\n

Newline

\\

The character \

\r

Carriage return

\o \oo \ooo(o = octal digit) ...

Get C Pocket Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.