Effects of Character Semantics
The upshot of all this is that a typical built-in
operator will operate on characters unless it is in the scope of a
use bytes
pragma. However, even outside the scope
of use bytes
, if all of the operands of the
operator are stored as 8-bit characters (that is, none of the operands
are stored in utf8), then character semantics are indistinguishable
from byte semantics, and the result of the operator will be stored in
8-bit form internally. This preserves backward compatibility as long
as you don't feed your program any characters wider than
Latin-1.
The utf8
pragma is primarily a
compatibility device that enables recognition of UTF-8 in literals and
identifiers encountered by the parser. It may also be used for
enabling some of the more experimental Unicode support features. Our
long-term goal is to turn the utf8
pragma into a
no-op.
The use bytes
pragma will never turn
into a no-op. Not only is it necessary for byte-oriented code, but it
also has the side effect of defining byte-oriented wrappers around
certain functions for use outside the scope of use
bytes
. As of this writing, the only defined wrapper is for
length
, but there are likely to be more as time
goes by. To use such a wrapper, say:
use bytes (); # Load wrappers without importing byte semantics. … $charlen = length("\x{ffff_ffff}"); # Returns 1. $bytelen = bytes::length("\x{ffff_ffff}"); # Returns 7.
Outside the scope of a use bytes
declaration, Perl version 5.6 works (or at least, is ...
Get Programming Perl, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.