The Encode Module
The standard Encode module is most often used implicitly, not explicitly.
It’s loaded automatically whenever you pass an :encoding(
argument to ENC)binmode or to
open.
However, you’ll sometimes find yourself with a bit of encoded data that didn’t come from a stream whose encoding you’ve set, so you’ll have to decode it manually before you can work with it. These encoded strings might come from anywhere outside your program, like an environment variable, a program argument, a CGI parameter, or a database field. Alas, you’ll even see “text” files where some lines have one encoding but other lines have different encodings. You are guaranteed to see mojibake.
In all these situations, you’ll need to turn to the Encode module to manage encoding and
decoding more explicitly. The functions you’ll most often use from
it are, surprise, encode and
decode. If you have raw external
data that’s still in some encoded form stored as bytes, call
decode to turn that into abstract
internal characters. On the flip side, if you have abstract internal
characters and you want to convert them to some particular encoding
scheme, you call encode.
use Encode qw(encode decode);
$chars = decode("shiftjis", $bytes);
$bytes = encode("MIME–Header–ISO_2022_JP", $chars);For example, if you knew for sure that your terminal encoding
was set to UTF-8, you could decode @ARGV this way:
# this works just like perl –CA
if (grep /\P{ASCII}/ => @ARGV) {
@ARGV = map { decode("UTF–8", $_) } @ARGV;
}For people ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access