Effects of Character Semantics
The upshot of all this is that a typical built-in
operator will operate on characters unless it is in the scope of a
use bytes pragma. However, even outside the scope
of use bytes, if all of the operands of the
operator are stored as 8-bit characters (that is, none of the operands
are stored in utf8), then character semantics are indistinguishable
from byte semantics, and the result of the operator will be stored in
8-bit form internally. This preserves backward compatibility as long
as you don't feed your program any characters wider than
Latin-1.
The utf8 pragma is primarily a
compatibility device that enables recognition of UTF-8 in literals and
identifiers encountered by the parser. It may also be used for
enabling some of the more experimental Unicode support features. Our
long-term goal is to turn the utf8 pragma into a
no-op.
The use bytes pragma will never turn
into a no-op. Not only is it necessary for byte-oriented code, but it
also has the side effect of defining byte-oriented wrappers around
certain functions for use outside the scope of use
bytes. As of this writing, the only defined wrapper is for
length, but there are likely to be more as time
goes by. To use such a wrapper, say:
use bytes (); # Load wrappers without importing byte semantics.
…
$charlen = length("\x{ffff_ffff}"); # Returns 1.
$bytelen = bytes::length("\x{ffff_ffff}"); # Returns 7.Outside the scope of a use bytes declaration, Perl version 5.6 works (or at least, is ...