Use split with a null pattern to break up the string into individual characters, or use unpack if you just want their ASCII values:
@array = split(//, $string); @array = unpack("C*", $string);
Or extract each character in turn with a loop:
while (/(.)/g) { # . is never a newline here # do something with $1 }
As we said before, Perl’s fundamental unit is the string, not the character. Needing to process anything a character at a time is rare. Usually some kind of higher-level Perl operation, like pattern matching, solves the problem more easily. See, for example, Section 7.7, where a set of substitutions is used to find command-line arguments.
Splitting on a pattern that matches the empty string returns a list
of the individual characters in the string. This is a convenient
feature when done intentionally, but it’s easy to do
unintentionally. For instance, /X*/
matches the
empty string. Odds are you will find others when you don’t mean
to.
Here’s an example that prints the characters used in the string
"an
apple
a
day
“, sorted in ascending ASCII order:
%seen = ();
$string = "an apple a day";
foreach $byte (split //, $string) {
$seen{$byte}++;
}
print "unique chars are: ", sort(keys %seen), "\n";
unique chars are: adelnpy
These split
and unpack
solutions give you an array of characters to work with. If you
don’t want an array, you can use a pattern match with the
/g
flag in a while
loop,
extracting one character at a time:
%seen = ();
$string = "an apple a day";
while ($string =~ /(.)/g) {
$seen{$1}++;
}
print "unique chars are: ", sort(keys %seen), "\n";
unique chars are: adelnpy
In general, if you find yourself doing character-by-character
processing, there’s probably a better way to go about it.
Instead of using index
and
substr
or split
and
unpack
, it might be easier to use a pattern.
Instead of computing a 32-bit checksum by hand, as in the next
example, the unpack
function can compute it far
more efficiently.
The following example calculates the checksum of
$string
with a foreach
loop.
There are better checksums; this just happens to be the basis of a
traditional and computationally easy checksum. See the MD5 module
from CPAN if you want a more sound checksum.
$sum = 0; foreach $ascval (unpack("C*", $string)) { $sum += $ascval; } print "sum is $sum\n"; # prints "1248" if $string was "an apple a day"
This does the same thing, but much faster:
$sum = unpack("%32C*", $string);
This lets us emulate the SysV checksum program:
#!/usr/bin/perl # sum - compute 16-bit checksum of all input files $checksum = 0; while (<>) { $checksum += unpack("%16C*", $_) } $checksum %= (2 ** 16) - 1; print "$checksum\n";
Here’s an example of its use:
% perl sum /etc/termcap
1510
If you have the GNU version of sum, you’ll need to call it with the --sysv option to get the same answer on the same file.
% sum --sysv /etc/termcap
1510 851 /etc/termcap
Another tiny program that processes its input one character at a time is slowcat, shown in Example 1.1. The idea here is to pause after each character is printed so you can scroll text before an audience slowly enough that they can read it.
The split
and unpack
functions
in perlfunc(1) and Chapter 3 of
Programming Perl; the use of expanding
select
for timing is explained in Section 3.10
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.