O'Reilly logo

Learning XML by Erik T. Ray

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

7.1. Character Sets and Encodings

Computers don't understand letters or symbols of any kind; numbers are all they know. Every file, whether a spreadsheet, letter, or XML document, is really just a long string of binary digits inside the computer. The data is encoded, meaning that every symbol is represented by a unique number in the file. Software translates the characters you type on the keyboard into these numerical codes, and another program translates them back into human-recognizable text.

An example of this process is Morse code. To transmit text over wires, a telegraph operator breaks down the text into individual letters, numbers, and symbols. She translates each of these into its unique Morse equivalent, a series of short and long signals, and transmits the message over the wire. On the receiving end, another operator translates the code back into text and scribbles the message onto a notepad. Sending email works in a similar fashion: you type in the message with a keyboard, software translates the keystrokes into numbers, the sequence is sent through the network to its destination, and the numbers are converted back into text and displayed on the recipient's screen.

The mapping of characters to numerical values creates a character set. The term character describes any piece of text or signal that can be represented in a single position in the character set. For example, the letter "Q" from the Latin alphabet is a single character, as is its lowercase cousin "q". ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required