Skip to Main Content
PHP in a Nutshell
book

PHP in a Nutshell

by Paul Hudson
October 2005
Intermediate to advanced content levelIntermediate to advanced
372 pages
11h 35m
English
O'Reilly Media, Inc.
Content preview from PHP in a Nutshell

Handling Non-English Characters

ASCII only allows a set of 256 characters to be used to describe the alphanumeric characters available to print. That range, 0 to 255, is used because it is the size of a byte—8 ones and zeros, in computing terminology. Languages such as Chinese, Korean, and Japanese have special characters in them, which means you need more than 256 characters, and therefore need more than one byte of space—you need a multibyte character. The multibyte character implementation in PHP is capable of working with Unicode-based encodings, such as UTF-8; however, at this time, Unicode support in PHP is very weak. Full Unicode support is currently one of the key goals for future releases of PHP.

Dealing with these complex characters is slightly different from working with normal characters, because functions like substr() and strtoupper() expect precisely one byte per character and will corrupt a multibyte string. Instead, you should use the multibyte equivalents of these functions, such as mb_strtoupper() instead of strtoupper(), mb_ereg_match() rather than ereg_match(), and mb_strlen() rather than strlen(). The parameters required for these functions are the same as their originals, except that most accept an optional extra parameter to force specific encoding.

If there is an existing script that you'd like to multibyte-enable, there's a special php.ini setting you can change: mbstring.func_overload. By default, this is set to 0, which means functions behave as you would ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

PHP Cookbook

PHP Cookbook

Eric A. Mann
Programming PHP

Programming PHP

Rasmus Lerdorf, Kevin Tatroe
Learning PHP

Learning PHP

David Sklar

Publisher Resources

ISBN: 0596100671Errata Page