Compressing Office Files That Contain Unicode Text

Because Unicode uses more bytes to store information, Microsoft Office 2003 Editions files may be larger when stored in Unicode than they would be if stored in earlier, non-Unicode versions of Office. However, Microsoft Office Word 2003 can automatically compress portions of files to reduce the size.

Office 2003 Editions store text in a form of Unicode called UTF-16, just as Office XP does. Unicode characters are encoded in two bytes (or very rarely, four bytes) rather than what is used in non-Unicode systems—for example, a single byte, or a mixture of one and two bytes in some Asian languages. Generally, Office 2003 Editions files with multilingual text are similar in size to Office 97, Office ...

Get Microsoft® Office 2003 Editions Resource Kit now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.