Processing Binary Files
Problem
Your system distinguishes between text and binary files. How do you?
Solution
Use the binmode
function on the filehandle:
binmode(HANDLE);
Discussion
Not everyone agrees what constitutes a line in a text file, because one person’s textual character set is another’s binary gibberish. Even when everyone is using ASCII instead of EBCDIC, Rad50, or Unicode, discrepancies arise.
As mentioned in the Introduction, there is no such thing as a newline character. It is purely virtual, a figment of the operating system, standard libraries, device drivers, and Perl.
Under Unix or Plan9, a "\n"
represents the
physical sequence "\cJ"
(the Perl double-quote
escape for Ctrl-J), a linefeed. However, on a terminal that’s
not in raw mode, an Enter key generates an incoming
"\cM"
(a carriage return) which turns into
"\cJ"
, whereas an outgoing
"\cJ"
turns into "\cM\cJ"
. This
strangeness doesn’t happen with normal files, just terminal
devices, and it is handled strictly by the device driver.
On a Mac, a "\n"
is usually represented by
"\cM"
; just to make life interesting (and because
the standard requires that "\n"
and
"\r"
be different), a "\r"
represents a "\cJ"
. This is exactly the opposite
of the way that Unix, Plan9, VMS, CP/M, or nearly anyone else does
it. So, Mac programmers writing files for other systems or talking
over a network have to be careful. If you send out
"\n"
, you’ll deliver a
"\cM"
, and no "\cJ"
will be seen. Most network services prefer to receive and ...
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.