Appendix E. Character Encodings

In !Appendix D, I discussed how computers store information, how a character-encoding scheme is a table that translates between characters, and how they are stored in the computer.

The most common character set (or character encoding) in use on computers is ASCII (The American Standard Code for Information Interchange), and it is probably the most widely used character set for encoding text electronically. You can expect all computers browsing the Web to understand ASCII.

Character Set

Description

ASCII

American Standard Code for Information Interchange, which is used on most computers

The problem with ASCII is that it supports only the upper- and lowercase Latin alphabet, the numbers 0-9, and some extra characters: a total of 128 characters in all. Here are the printable characters of ASCII (the other characters are things such as line feeds and carriage-return characters).

 

!

``

#

$

%

&

`

(

)

*

+

,

-

.

/

0

1

2

3

4

5

6

7

8

9

:

;

<

=

>

?

@

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

[

\

]

^

_

`

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

q

r

s

t

u

v

w

x

y

z

{

|

}

~

 

However, many languages use either accented Latin characters or completely different alphabets. ASCII does not address these characters, so you need to learn about character encodings if you want to use any non-ASCII characters.

Character encodings are also particularly important if you want to use symbols, as these cannot be guaranteed to transfer properly between different encodings (from some dashes to some quotation mark characters). If you do not indicate the character encoding the document ...

Get Beginning HTML, XHTML, CSS, and JavaScript® now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.