You want to access or modify just a portion of a string, not the whole thing. For instance, you’ve read a fixed-width record and want to extract the individual fields.
The substr
function lets you read from and write to bits of the string.
$value = substr($string, $offset, $count); $value = substr($string, $offset); substr($string, $offset, $count) = $newstring; substr($string, $offset) = $newtail;
The unpack
function gives only read access, but is faster when you have many
substrings to extract.
# get a 5-byte string, skip 3, then grab 2 8-byte strings, then the rest ($leading, $s1, $s2, $trailing) = unpack("A5 x3 A8 A8 A*", $data); # split at five byte boundaries @fivers = unpack("A5" x (length($string)/5), $string); # chop string into individual characters @chars = unpack("A1" x length($string), $string);
Unlike many other languages that represent strings as arrays of bytes
(or characters), in Perl, strings are a basic data type. This means
that you must use functions like unpack
or
substr
to access individual characters or a
portion of the string.
The offset argument to substr
indicates the start
of the substring you’re interested in, counting from the front
if positive and from the end if negative. If offset is 0, the
substring starts at the beginning. The count argument is the length
of the substring.
$string = "This is what you have"; # +012345678901234567890 Indexing forwards (left to right) # 109876543210987654321- Indexing backwards (right to left) # note that 0 means 10 or 20, etc. above $first = substr($string, 0, 1); # "T" $start = substr($string, 5, 2); # "is" $rest = substr($string, 13); # "you have" $last = substr($string, -1); # "e" $end = substr($string, -4); # "have" $piece = substr($string, -8, 3); # "you"
You can do more than just look at parts of the string with
substr
; you can actually change them. That’s
because substr
is a particularly odd kind of
function—an
lvaluable
one, that is, a function that may
itself be assigned a value. (For the record, the others are
vec
, pos
, and as of the 5.004
release, keys
. If you squint,
local
and my
can also be viewed
as lvaluable functions.)
$string = "This is what you have"; print $string;substr($string, 5, 2) = "wasn't"; # change "is" to "wasn't"
This is what you have
substr($string, -12) = "ondrous";# "This wasn't wondrous"
This wasn't what you have
substr($string, 0, 1) = ""; # delete first character
This wasn't wondrous
substr($string, -10) = ""; # delete last 10 characters
his wasn't wondrous
his wasn'
You can use the =~
operator and the s///
, m//
, or
tr///
operators in conjunction with substr to make
them affect only that portion of the string.
# you can test substrings with =~ if (substr($string, -10) =~ /pattern/) { print "Pattern matches in last 10 characters\n"; } # substitute "at" for "is", restricted to first five characters substr($string, 0, 5) =~ s/is/at/g;
You can even swap values by using several substr
s
on each side of an assignment:
# exchange the first and last letters in a string
$a = "make a hat";
(substr($a,0,1), substr($a,-1)) = (substr($a,-1), substr($a,0,1));
print $a;
take a ham
Although unpack
is not lvaluable, it is
considerably faster than substr
when you extract
numerous values at once. It doesn’t directly support offsets as
substr
does. Instead, it uses lowercase
"x
" with a count to skip forward some number of
bytes and an uppercase "X
" with a count to skip
backward some number of bytes.
# extract column with unpack $a = "To be or not to be"; $b = unpack("x6 A6", $a); # skip 6, grab 6 print $b;($b, $c) = unpack("x6 A2 X5 A2", $a); # forward 6, grab 2; backward 5, grab 2 print "$b\n$c\n";
or not
or
be
Sometimes you prefer to think of your data as being cut up at
specific columns. For example, you might want to
place cuts right before positions 8, 14, 20, 26, and 30. Those are
the column numbers where each field begins. Although you could
calculate that the proper unpack
format is
"A7
A6
A6
A6
A4
A*
“,
this is too much mental strain for the virtuously lazy Perl
programmer. Let Perl figure it out for you. Use the
cut2fmt
function below:
sub cut2fmt {
my(@positions) = @_;
my $template = '';
my $lastpos = 1;
foreach $place (@positions) {
$template .= "A" . ($place - $lastpos) . " ";
$lastpos = $place;
}
$template .= "A*";
return $template;
}
$fmt = cut2fmt(8, 14, 20, 26, 30);
print "$fmt\n";
A7 A6 A6 A6 A4 A*
The powerful unpack
function goes far beyond mere
text processing. It’s the gateway between text and binary data.
The unpack
and substr
functions
in perlfunc(1) and Chapter 3 of
Programming Perl; the
cut2fmt subroutine of Section 1.18; the binary use of unpack
in Section 8.18
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.