Appendix B. Base64

Base64 is a system for representing raw byte data as ASCII characters. You could use hexadecimal for the same purpose, but it’s not as efficient. One hex digit (eight bits) corresponds to four bits of input. Data represented in hexadecimal will be double the size of the original. As its name implies, base64 improves this ratio by representing six bits with each digit. Thus, 3 input bytes (3 x 8 = 24 bits) translates into 4 base64 digits (4 x 6 = 24 bits). Each base64 digit is represented by an ASCII character. Figure 2.1 shows how bytes are converted to base64 digits.

Byte to base64 conversion
Figure B-1. Byte to base64 conversion

Base64 encoding always extends the input data to a multiple of 24 bits (3 bytes) by padding with zeros. There are three distinct cases:

  • The input data is a multiple of 3 bytes. In this case, no padding is needed.

  • The input data has one extra byte. This byte is split into two base64 digits, and two special padding digits (the = symbol) are added to the end of the base64 representation.

  • The input data has two extra bytes. These bytes are represented by three base64 digits and one padding digit is added to the end of the base64 representation.

The base64 system is fully described in RFC 1521, in section 5.2. You can download this document from ftp://ds.internic.net/rfc/rfc1521.txt.

Sun provides base64 conversion classes in the unsupported sun.misc package. If you don’t ...

Get Java Cryptography now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.