Chapter 1. Strings
Introduction
Strings in PHP are sequences of bytes, such as “We hold these truths to be self-evident” or “Once upon a time” or even “111211211.” When you read data from a file or output it to a web browser, your data are represented as strings.
PHP strings are binary-safe (i.e., they can contain null bytes) and can grow and shrink on demand. Their size is limited only by the amount of memory that is available to PHP.
Warning
Usually, PHP strings are ASCII strings. You must do extra work to handle non-ASCII data like UTF-8 or other multibyte character encodings, see Chapter 19.
Similar in form and behavior to Perl and the Unix shell, strings can be initialized in three ways: with single quotes, with double quotes , and with the “here document” (heredoc) format. With single-quoted strings, the only special characters you need to escape inside a string are backslash and the single quote itself. Example 1-1 shows four single-quoted strings.
print 'I have gone to the store.'; print 'I\'ve gone to the store.'; print 'Would you pay $1.75 for 8 ounces of tap water?'; print 'In double-quoted strings, newline is represented by \n';
Example 1-1 prints:
I have gone to the store. I've gone to the store. Would you pay $1.75 for 8 ounces of tap water? In double-quoted strings, newline is represented by \n
Because PHP doesn’t check for variable interpolation or almost any escape sequences in single-quoted strings, defining strings this way is straightforward and fast.
Double-quoted strings don’t recognize escaped single quotes, but they do recognize interpolated variables and the escape sequences shown in Table 1-1.
Example 1-2 shows some double-quoted strings.
print "I've gone to the store."; print "The sauce cost \$10.25."; $cost = '$10.25'; print "The sauce cost $cost."; print "The sauce cost \$\061\060.\x32\x35.";
Example 1-2 prints:
I've gone to the store. The sauce cost $10.25. The sauce cost $10.25. The sauce cost $10.25.
The last line of Example 1-2
prints the price of sauce correctly because the character 1
is ASCII code 49 decimal and 061 octal.
Character 0
is ASCII 48 decimal and
060 octal; 2
is ASCII 50 decimal and
32 hex; and 5
is ASCII 53 decimal and
35 hex.
Heredoc -specified strings recognize all the interpolations and
escapes of double-quoted strings, but they don’t require double quotes
to be escaped. Heredocs start with <<<
and a token. That token (with no
leading or trailing whitespace), followed by a semicolon to end the
statement (if necessary), ends the heredoc. Example 1-3 shows how to define a
heredoc.
print <<< END It's funny when signs say things like: Original "Root" Beer "Free" Gift Shoes cleaned while "you" wait or have other misquoted words. END;
Example 1-3 prints:
It's funny when signs say things like: Original "Root" Beer "Free" Gift Shoes cleaned while "you" wait or have other misquoted words.
Newlines, spacing, and quotes are all preserved in a heredoc. By convention, the end-of-string identifier is usually all caps, and it is case sensitive. Example 1-4 shows two more valid heredocs.
print <<< PARSLEY It's easy to grow fresh: Parsley Chives on your windowsill PARSLEY; print <<< DOGS If you like pets, yell out: DOGS AND CATS ARE GREAT! DOGS;
Heredocs are especially useful for printing out HTML with interpolated variables, since you don’t have to escape the double quotes that appear in the HTML elements. Example 1-5 uses a heredoc to print HTML.
if ($remaining_cards > 0) { $url = '/deal.php'; $text = 'Deal More Cards'; } else { $url = '/new-game.php'; $text = 'Start a New Game'; } print <<< HTML There are <b>$remaining_cards</b> left. <p> <a href="$url">$text</a> HTML;
In Example 1-5, the semicolon needs to go after the end-of-string delimiter to tell PHP the statement is ended. In some cases, however, you shouldn’t use the semicolon. One of these cases is shown in Example 1-6, which uses a heredoc with the string concatenation operator .
$html = <<< END <div class="$divClass"> <ul class="$ulClass"> <li> END . $listItem . '</li></div>'; print $html;
Assuming some reasonable values for the $divClass
, $ulClass
, and $listItem
variables, Example 1-6 prints:
<div class="class1"> <ul class="class2"> <li> The List Item </li></div>
In Example 1-6, the
expression needs to continue on the next line, so you don’t use a
semicolon. Note also that in order for PHP to recognize
the end-of-string delimiter, the .
string concatenation operator needs to go on a separate line from the
end-of-string delimiter.
Individual bytes in strings can be referenced with square brackets. The first byte in the string is at index 0. Example 1-7 grabs one byte from a string.
Example 1-7 prints:
d
You can also use curly braces to access individual byte in a
string. That is, $neighbor{3}
is the same as $neighbor[3]
. The curly brace syntax is a
newer addition to PHP. It provides a visual distinction between string
indexing and array indexing.
1.1. Accessing Substrings
Problem
You want to know if a string contains a particular substring. For
example, you want to find out if an email address contains a @
.
Solution
Use strpos()
, as in Example 1-8.
Discussion
The return value from strpos()
is the first position in the
string (the “haystack”) at which the substring (the “needle”) was
found. If the needle wasn’t found at all in the haystack, strpos()
returns false
. If the needle is at the beginning of
the haystack, strpos()
returns
0, since position 0 represents the beginning of the string. To
differentiate between return values of 0 and false
, you must use the identity operator
(===
) or the not–identity operator
(!==
) instead of regular equals
(==
) or not-equals (!=
). Example 1-8 compares the return value from
strpos()
to false
using ===
. This test only succeeds if strpos
returns false
, not if it returns
0 or any other number.
See Also
Documentation on strpos()
at http://www.php.net/strpos.
1.2. Extracting Substrings
Problem
You want to extract part of a string, starting at a particular place in the string. For example, you want the first eight characters of a username entered into a form.
Solution
Use substr()
to select your substring, as in Example 1-9.
Discussion
If $start
and $length
are positive, substr()
returns $length
characters in the string, starting
at $start
. The first character in
the string is at position 0. Example 1-10 has positive $start
and $length
.
print substr('watch out for that tree',6,5);
Example 1-10 prints:
out f
If you leave out $length
,
substr()
returns the string from
$start
to the end of the original
string, as shown in Example 1-11.
print substr('watch out for that tree',17);
Example 1-11 prints:
t tree
If $start
is bigger than the
length of the string, substr()
returns false.
.
If $start
plus $length
goes past the end of the string,
substr()
returns all of the
string from $start
forward, as
shown in Example 1-12.
print substr('watch out for that tree',20,5);
Example 1-12 prints:
ree
If $start
is negative,
substr()
counts back from the end
of the string to determine where your substring starts, as shown in
Example 1-13.
print substr('watch out for that tree',-6); print substr('watch out for that tree',-17,5);
Example 1-13 prints:
t tree out f
With a negative $start
value
that goes past the beginning of the string (for example, if $start
is −27 with a 20-character string), substr()
behaves as if $start
is 0.
If $length
is negative,
substr()
counts back from the end
of the string to determine where your substring ends, as shown in
Example 1-14.
print substr('watch out for that tree',15,-2); print substr('watch out for that tree',-4,-1);
Example 1-14 prints:
hat tr tre
See Also
Documentation on substr()
at http://www.php.net/substr.
1.3. Replacing Substrings
Problem
You want to replace a substring with a different string. For example, you want to obscure all but the last four digits of a credit card number before printing it.
Solution
Use substr_replace()
,
as in Example 1-15.
// Everything from position $start to the end of $old_string // becomes $new_substring $new_string = substr_replace($old_string,$new_substring,$start); // $length characters, starting at position $start, become $new_substring $new_string = substr_replace($old_string,$new_substring,$start,$length);
Discussion
Without the $length
argument,
substr_replace()
replaces
everything from $start
to the end
of the string. If $length
is
specified, only that many characters are replaced:
print substr_replace('My pet is a blue dog.','fish.',12); print substr_replace('My pet is a blue dog.','green',12,4); $credit_card = '4111 1111 1111 1111'; print substr_replace($credit_card,'xxxx ',0,strlen($credit_card)-4); My pet is a fish. My pet is a green dog. xxxx 1111
If $start
is negative, the
new substring is placed at $start
characters counting from the end of $old_string
, not from the beginning:
print substr_replace('My pet is a blue dog.','fish.',-9); print substr_replace('My pet is a blue dog.','green',-9,4); My pet is a fish. My pet is a green dog.
If $start
and $length
are 0, the new substring is inserted
at the start of $old_string
:
print substr_replace('My pet is a blue dog.','Title: ',0,0); Title: My pet is a blue dog.
The function substr_replace()
is useful when you’ve
got text that’s too big to display all at once, and you want to
display some of the text with a link to the rest. Example 1-16 displays the first 25
characters of a message with an ellipsis after it as a link to a page
that displays more text.
$r = mysql_query("SELECT id,message FROM messages WHERE id = $id") or die(); $ob = mysql_fetch_object($r); printf('<a href="more-text.php?id=%d">%s</a>', $ob->id, substr_replace($ob->message,' ...',25));
The more-text.php page referenced in Example 1-16 can use the message ID passed in the query string to retrieve the full message and display it.
See Also
Documentation on substr_replace()
at http://www.php.net/substr-replace.
1.4. Processing a String One Byte at a Time
Solution
Loop through each byte in the string with for
. Example 1-17 counts the vowels in a
string.
Discussion
Processing a string a character at a time is an easy way to calculate the “Look and Say” sequence, as shown in Example 1-18.
<?php function lookandsay($s) { // initialize the return value to the empty string $r = ''; // $m holds the character we're counting, initialize to the first // character in the string $m = $s[0]; // $n is the number of $m's we've seen, initialize to 1 $n = 1; for ($i = 1, $j = strlen($s); $i < $j; $i++) { // if this character is the same as the last one if ($s[$i] == $m) { // increment the count of this character $n++; } else { // otherwise, add the count and character to the return value $r .= $n.$m; // set the character we're looking for to the current one $m = $s[$i]; // and reset the count to 1 $n = 1; } } // return the built up string as well as the last count and character return $r.$n.$m; } for ($i = 0, $s = 1; $i < 10; $i++) { $s = lookandsay($s); print "$s <br/>\n"; }
Example 1-18 prints:
1 11 21 1211 111221 312211 13112221 1113213211 31131211131221 13211311123113112211
It’s called the “Look and Say” sequence because each element is what you get by looking at the previous element and saying what’s in it. For example, looking at the first element, 1, you say “one one.” So the second element is “11.” That’s two ones, so the third element is “21.” Similarly, that’s one two and one one, so the fourth element is “1211,” and so on.
See Also
Documentation on for
at
http://www.php.net/for; more about the “Look and
Say” sequence at http://mathworld.wolfram.com/LookandSaySequence.html.
1.5. Reversing a String by Word or Byte
Solution
Use strrev()
to reverse by byte, as in Example 1-19.
Example 1-19 prints:
.emordnilap a ton si sihT
To reverse by words, explode the string by word boundary, reverse the words, and then rejoin, as in Example 1-20.
<?php $s = "Once upon a time there was a turtle."; // break the string up into words $words = explode(' ',$s); // reverse the array of words $words = array_reverse($words); // rebuild the string $s = implode(' ',$words); print $s; ?>
Example 1-20 prints:
turtle. a was there time a upon Once
Discussion
Reversing a string by words can also be done all in one line with the code in Example 1-21.
See Also
Recipe 23.7 discusses the
implications of using something other than a space character as your
word boundary; documentation on strrev()
at http://www.php.net/strrev and array_reverse()
at http://www.php.net/array-reverse.
1.6. Expanding and Compressing Tabs
Problem
You want to change spaces to tabs (or tabs to spaces) in a string while keeping text aligned with tab stops. For example, you want to display formatted text to users in a standardized way.
Solution
Use str_replace()
to switch spaces to tabs or tabs to spaces,
as shown in Example 1-22.
<?php $r = mysql_query("SELECT message FROM messages WHERE id = 1") or die(); $ob = mysql_fetch_object($r); $tabbed = str_replace(' ',"\t",$ob->message); $spaced = str_replace("\t",' ',$ob->message); print "With Tabs: <pre>$tabbed</pre>"; print "With Spaces: <pre>$spaced</pre>"; ?>
Using str_replace()
for
conversion, however, doesn’t respect tab stops. If you want tab stops
every eight characters, a line beginning with a five-letter word and a
tab should have that tab replaced with three spaces, not one. Use the
pc_tab_expand()
function shown in Example 1-23 into turn tabs to spaces in a
way that respects tab stops.
<?php function pc_tab_expand($text) { while (strstr($text,"\t")) { $text = preg_replace_callback('/^([^\t\n]*)(\t+)/m','pc_tab_expand_helper', $text); } return $text; } function pc_tab_expand_helper($matches) { $tab_stop = 8; return $matches[1] . str_repeat(' ',strlen($matches[2]) * $tab_stop - (strlen($matches[1]) % $tab_stop)); } $spaced = pc_tab_expand($ob->message); ?>
You can use the pc_tab_unexpand()
function shown in Example 1-24 to turn spaces back to
tabs.
<?php function pc_tab_unexpand($text) { $tab_stop = 8; $lines = explode("\n",$text); foreach ($lines as $i => $line) { // Expand any tabs to spaces $line = pc_tab_expand($line); $chunks = str_split($line, $tab_stop); $chunkCount = count($chunks); // Scan all but the last chunk for ($j = 0; $j < $chunkCount - 1; $j++) { $chunks[$j] = preg_replace('/ {2,}$/',"\t",$chunks[$j]); } // If the last chunk is a tab-stop's worth of spaces // convert it to a tab; Otherwise, leave it alone if ($chunks[$chunkCount-1] == str_repeat(' ', $tab_stop)) { $chunks[$chunkCount-1] = "\t"; } // Recombine the chunks $lines[$i] = implode('',$chunks); } // Recombine the lines return implode("\n",$lines); } $tabbed = pc_tab_unexpand($ob->message); ?>
Both functions take a string as an argument and return the string appropriately modified.
Discussion
Each function assumes tab stops are every eight spaces, but that
can be modified by changing the setting of the $tab_stop
variable.
The regular expression in pc_tab_expand()
matches both a group of
tabs and all the text in a line before that group of tabs. It needs to
match the text before the tabs because the length of that text affects
how many spaces the tabs should be replaced with so that subsequent
text is aligned with the next tab stop. The function doesn’t just
replace each tab with eight spaces; it adjusts text after tabs to line
up with tab stops.
Similarly, pc_tab_unexpand()
doesn’t just look for
eight consecutive spaces and then replace them with one tab character.
It divides up each line into eight-character chunks and then
substitutes ending whitespace in those chunks (at least two spaces)
with tabs. This not only preserves text alignment with tab stops; it
also saves space in the string.
See Also
Documentation on str_replace()
at http://www.php.net/str-replace, on preg_replace_callback()
at http://www.php.net/preg_replace_callback, and on
str_split()
at http://www.php.net/str_split. Recipe 22.10 has more information on preg_replace_callback()
.
1.7. Controlling Case
Problem
You need to capitalize, lowercase, or otherwise modify the case of letters in a string. For example, you want to capitalize the initial letters of names but lowercase the rest.
Solution
Use ucfirst()
or ucwords()
to
capitalize the first letter of one or more words, as shown in Example 1-25.
<?php print ucfirst("how do you do today?"); print ucwords("the prince of wales"); ?>
Example 1-25 prints:
How do you do today? The Prince Of Wales
Use strtolower()
or strtoupper()
to modify the case of entire strings, as in Example 1-26.
print strtoupper("i'm not yelling!"); // Tags must be lowercase to be XHTML compliant print strtolower('<A HREF="one.php">one</A>');
Example 1-26 prints:
I'M NOT YELLING! <a href="one.php">one</a>
Discussion
Use ucfirst()
to
capitalize the first character in a string:
<?php print ucfirst('monkey face'); print ucfirst('1 monkey face'); ?>
This prints:
Monkey face 1 monkey face
Note that the second phrase is not “1 Monkey face.”
Use ucwords()
to
capitalize the first character of each word in a string:
<?php print ucwords('1 monkey face'); print ucwords("don't play zone defense against the philadelphia 76-ers"); ?>
This prints:
1 Monkey Face Don't Play Zone Defense Against The Philadelphia 76-ers
As expected, ucwords()
doesn’t capitalize the “t” in “don’t.” But it also doesn’t capitalize
the “e” in “76-ers.” For ucwords()
, a word is any sequence of
nonwhitespace characters that follows one or more whitespace
characters. Since both '
and
-
aren’t whitespace characters,
ucwords()
doesn’t consider the
“t” in “don’t” or the “e” in “76-ers” to be word-starting
characters.
Both ucfirst()
and
ucwords( )
don’t change the case
of non-first letters:
<?php print ucfirst('macWorld says I should get an iBook'); print ucwords('eTunaFish.com might buy itunaFish.Com!'); ?>
This prints:
MacWorld says I should get an iBook ETunaFish.com Might Buy ItunaFish.Com!
The functions strtolower()
and strtoupper()
work on entire strings, not
just individual characters. All alphabetic characters are changed to
lowercase by strtolower()
and
strtoupper()
changes all
alphabetic characters to uppercase:
<?php print strtolower("I programmed the WOPR and the TRS-80."); print strtoupper('"since feeling is first" is a poem by e. e. cummings.'); ?>
This prints:
i programmed the wopr and the trs-80. "SINCE FEELING IS FIRST" IS A POEM BY E. E. CUMMINGS.
When determining upper- and lowercase, these functions respect your locale settings.
See Also
For more information about locale settings, see Chapter 19; documentation on ucfirst()
at http://www.php.net/ucfirst, ucwords()
at http://www.php.net/ucwords, strtolower()
at http://www.php.net/strtolower, and strtoupper()
at http://www.php.net/strtoupper.
1.8. Interpolating Functions and Expressions Within Strings
Solution
Use the string concatenation operator (.), as shown in Example 1-27, when the value you want to include can’t be inside the string.
<?php print 'You have '.($_REQUEST['boys'] + $_REQUEST['girls']).' children.'; print "The word '$word' is ".strlen($word).' characters long.'; print 'You owe '.$amounts['payment'].' immediately'; print "My circle's diameter is ".$circle->getDiameter().' inches.'; ?>
Discussion
You can put variables, object properties, and array elements (if the subscript is unquoted) directly in double-quoted strings:
<?php print "I have $children children."; print "You owe $amounts[payment] immediately."; print "My circle's diameter is $circle->diameter inches."; ?>
Interpolation with double-quoted strings places some limitations
on the syntax of what can be interpolated. In the previous example,
$amounts['payment']
had to be
written as $amounts[payment]
so it
would be interpolated properly. Use curly braces around more complicated expressions to
interpolate them into a string. For example:
<?php print "I have less than {$children} children."; print "You owe {$amounts['payment']} immediately."; print "My circle's diameter is {$circle->getDiameter()} inches."; ?>
Direct interpolation or using string concatenation also works with heredocs. Interpolating with string concatenation in heredocs can look a little strange because the closing heredoc delimiter and the string concatenation operator have to be on separate lines:
<?php print <<< END Right now, the time is END . strftime('%c') . <<< END but tomorrow it will be END . strftime('%c',time() + 86400); ?>
Also, if you’re interpolating with heredocs, make sure to
include appropriate spacing for the whole string to appear properly.
In the previous example, Right now the
time
has to include a trailing space, and but tomorrow it will be
has to include
leading and trailing spaces.
See Also
For the syntax to interpolate variable variables (such as
${"amount_$i"}
), see Recipe 5.4; documentation on the string
concatenation operator at http://www.php.net/language.operators.string.
1.9. Trimming Blanks from a String
Problem
You want to remove whitespace from the beginning or end of a string. For example, you want to clean up user input before validating it.
Solution
Use ltrim()
, rtrim()
, or trim()
. ltrim()
removes whitespace from the beginning of a string,
rtrim()
from the end of a
string, and trim()
from both
the beginning and end of a string:
<?php $zipcode = trim($_REQUEST['zipcode']); $no_linefeed = rtrim($_REQUEST['text']); $name = ltrim($_REQUEST['name']); ?>
Discussion
For these functions, whitespace is defined as the following characters: newline, carriage return, space, horizontal and vertical tab, and null.
Trimming whitespace off of strings saves storage space and can
make for more precise display of formatted data or text within
<pre>
tags, for example. If
you are doing comparisons with user input, you should trim the data
first, so that someone who mistakenly enters “98052” as their zip code
isn’t forced to fix an error that really isn’t one. Trimming before
exact text comparisons also ensures that, for example, “salami\n”
equals “salami.” It’s also a good idea to normalize string data by
trimming it before storing it in a database.
The trim()
functions can
also remove user-specified characters from strings. Pass the
characters you want to remove as a second argument. You can indicate a
range of characters with two dots between the first and last
characters in the range:
<?php // Remove numerals and space from the beginning of the line print ltrim('10 PRINT A$',' 0..9'); // Remove semicolon from the end of the line print rtrim('SELECT * FROM turtles;',';'); ?>
This prints:
PRINT A$ SELECT * FROM turtles
PHP also provides chop()
as an alias for rtrim()
. However, you’re best off using
rtrim()
instead because PHP’s
chop()
behaves differently than
Perl’s chop()
(which is deprecated in favor of chomp()
, anyway), and using it can confuse others when they
read your code.
See Also
Documentation on trim()
at http://www.php.net/trim, ltrim()
at http://www.php.net/ltrim, and rtrim()
at http://www.php.net/rtrim.
1.10. Generating Comma-Separated Data
Problem
You want to format data as comma-separated values (CSV) so that it can be imported by a spreadsheet or database.
Solution
Use the fputcsv()
function to generate a CSV-formatted line from an
array of data. Example 1-28 writes the data in $sales
into a file.
<?php $sales = array( array('Northeast','2005-01-01','2005-02-01',12.54), array('Northwest','2005-01-01','2005-02-01',546.33), array('Southeast','2005-01-01','2005-02-01',93.26), array('Southwest','2005-01-01','2005-02-01',945.21), array('All Regions','--','--',1597.34) ); $fh = fopen('sales.csv','w') or die("Can't open sales.csv"); foreach ($sales as $sales_line) { if (fputcsv($fh, $sales_line) === false) { die("Can't write CSV line"); } } fclose($fh) or die("Can't close sales.csv"); ?>
Discussion
To print the CSV-formatted data instead of writing it to a file,
use the special output stream php://output
, as shown in Example 1-29.
<?php $sales = array( array('Northeast','2005-01-01','2005-02-01',12.54), array('Northwest','2005-01-01','2005-02-01',546.33), array('Southeast','2005-01-01','2005-02-01',93.26), array('Southwest','2005-01-01','2005-02-01',945.21), array('All Regions','--','--',1597.34) ); $fh = fopen('php://output','w'); foreach ($sales as $sales_line) { if (fputcsv($fh, $sales_line) === false) { die("Can't write CSV line"); } } fclose($fh); ?>
To put the CSV-formatted data into a string instead of printing it or writing it to a file, combine the technique in Example 1-29 with output buffering, as shown in Example 1-30.
<?php $sales = array( array('Northeast','2005-01-01','2005-02-01',12.54), array('Northwest','2005-01-01','2005-02-01',546.33), array('Southeast','2005-01-01','2005-02-01',93.26), array('Southwest','2005-01-01','2005-02-01',945.21), array('All Regions','--','--',1597.34) ); ob_start(); $fh = fopen('php://output','w') or die("Can't open php://output"); foreach ($sales as $sales_line) { if (fputcsv($fh, $sales_line) === false) { die("Can't write CSV line"); } } fclose($fh) or die("Can't close php://output"); $output = ob_get_contents(); ob_end_clean(); ?>
See Also
Documentation on fputcsv()
at http://www.php.net/fputcsv; Recipe 8.12 more information about output
buffering.
1.11. Parsing Comma-Separated Data
Problem
You have data in comma-separated values (CSV) format—for example, a file exported from Excel or a database—and you want to extract the records and fields into a format you can manipulate in PHP.
Solution
If the CSV data is in a file (or available via a URL), open the
file with fopen()
and read in the data with fgetcsv()
. Example 1-31 prints out
CSV data in an HTML table.
<?php $fp = fopen('sample2.csv','r') or die("can't open file"); print "<table>\n"; while($csv_line = fgetcsv($fp)) { print '<tr>'; for ($i = 0, $j = count($csv_line); $i < $j; $i++) { print '<td>'.htmlentities($csv_line[$i]).'</td>'; } print "</tr>\n"; } print '</table>\n'; fclose($fp) or die("can't close file"); ?>
Discussion
In PHP 4, you must provide a second argument to fgetcsv()
that is a value larger than the
maximum length of a line in your CSV file. (Don’t forget to count the
end-of-line whitespace.) In PHP 5 the line length is optional. Without
it, fgetcsv()
reads in an
entire line of data. (Or, in PHP 5.0.4 and later, you can pass a line
length of 0 to do the same thing.) If your average line length is more
than 8,192 bytes, your program may run faster if you specify an
explicit line length instead of letting PHP figure it out.
You can pass fgetcsv()
an
optional third argument, a delimiter to use instead of a comma
(,
). However, using a different
delimiter somewhat defeats the purpose of CSV as an easy way to
exchange tabular data.
Don’t be tempted to bypass fgetcsv()
and just read a line in and
explode()
on the commas. CSV is
more complicated than that, able to deal with field values that have,
for example, literal commas in them that should not be treated as
field delimiters. Using fgetcsv()
protects you and your code from
subtle errors.
See Also
Documentation on fgetcsv()
at http://www.php.net/fgetcsv .
1.12. Generating Fixed-Width Field Data Records
Solution
Use pack()
with a format string that specifies a sequence of
space-padded strings. Example 1-32 transforms an array of data into fixed-width records.
<?php $books = array( array('Elmer Gantry', 'Sinclair Lewis', 1927), array('The Scarlatti Inheritance','Robert Ludlum',1971), array('The Parsifal Mosaic','William Styron',1979) ); foreach ($books as $book) { print pack('A25A15A4', $book[0], $book[1], $book[2]) . "\n"; } ?>
Discussion
The format string A25A14A4
tells pack()
to transform its
subsequent arguments into a 25-character space-padded string, a
14-character space-padded string, and a 4-character space-padded
string. For space-padded fields in fixed-width records, pack()
provides a concise
solution.
To pad fields with something other than a space, however,
use substr()
to
ensure that the field values aren’t too long and str_pad()
to ensure that the field values
aren’t too short. Example 1-33 transforms
an array of records into fixed-width records with .
-padded fields.
<?php $books = array( array('Elmer Gantry', 'Sinclair Lewis', 1927), array('The Scarlatti Inheritance','Robert Ludlum',1971), array('The Parsifal Mosaic','William Styron',1979) ); foreach ($books as $book) { $title = str_pad(substr($book[0], 0, 25), 25, '.'); $author = str_pad(substr($book[1], 0, 15), 15, '.'); $year = str_pad(substr($book[2], 0, 4), 4, '.'); print "$title$author$year\n"; } ?>
See Also
Documentation on pack()
at http://www.php.net/pack and on str_pad()
at
http://www.php.net/str_pad. Recipe 1.16 discusses pack()
format strings in more
detail.
1.13. Parsing Fixed-Width Field Data Records
Problem
You need to break apart fixed-width records in strings.
Solution
Use substr()
as shown in
Example 1-34.
<?php $fp = fopen('fixed-width-records.txt','r') or die ("can't open file"); while ($s = fgets($fp,1024)) { $fields[1] = substr($s,0,10); // first field: first 10 characters of the line $fields[2] = substr($s,10,5); // second field: next 5 characters of the line $fields[3] = substr($s,15,12); // third field: next 12 characters of the line // a function to do something with the fields process_fields($fields); } fclose($fp) or die("can't close file"); ?>
Or unpack()
, as shown in Example 1-35.
<?php $fp = fopen('fixed-width-records.txt','r') or die ("can't open file"); while ($s = fgets($fp,1024)) { // an associative array with keys "title", "author", and "publication_year" $fields = unpack('A25title/A14author/A4publication_year',$s); // a function to do something with the fields process_fields($fields); } fclose($fp) or die("can't close file"); ?>
Discussion
Data in which each field is allotted a fixed number of characters per line may look like this list of books, titles, and publication dates:
<?php $booklist=<<<END Elmer Gantry Sinclair Lewis1927 The Scarlatti InheritanceRobert Ludlum 1971 The Parsifal Mosaic Robert Ludlum 1982 Sophie's Choice William Styron1979 END; ?>
In each line, the title occupies the first 25 characters, the
author’s name the next 14 characters, and the publication year the
next 4 characters. Knowing those field widths, you can easily use
substr()
to parse the fields
into an array:
<?php $books = explode("\n",$booklist); for($i = 0, $j = count($books); $i < $j; $i++) { $book_array[$i]['title'] = substr($books[$i],0,25); $book_array[$i]['author'] = substr($books[$i],25,14); $book_array[$i]['publication_year'] = substr($books[$i],39,4); } ?>
Exploding $booklist
into an
array of lines makes the looping code the same whether it’s operating
over a string or a series of lines read in from a file.
The loop can be made more flexible by specifying the field names
and widths in a separate array that can be passed to a parsing
function, as shown in the pc_fixed_width_substr()
function in Example 1-36.
<?php function pc_fixed_width_substr($fields,$data) { $r = array(); for ($i = 0, $j = count($data); $i < $j; $i++) { $line_pos = 0; foreach($fields as $field_name => $field_length) { $r[$i][$field_name] = rtrim(substr($data[$i],$line_pos,$field_length)); $line_pos += $field_length; } } return $r; } $book_fields = array('title' => 25, 'author' => 14, 'publication_year' => 4); $book_array = pc_fixed_width_substr($book_fields,$books); ?>
The variable $line_pos
keeps
track of the start of each field and is advanced by the previous
field’s width as the code moves through each line. Use rtrim()
to remove trailing whitespace from each field.
You can use unpack()
as a
substitute for substr()
to
extract fields. Instead of specifying the field names and widths as an
associative array, create a format string for unpack()
. A fixed-width field extractor
using unpack()
looks like the
pc_fixed_width_unpack()
function shown in Example 1-37.
<?php function pc_fixed_width_unpack($format_string,$data) { $r = array(); for ($i = 0, $j = count($data); $i < $j; $i++) { $r[$i] = unpack($format_string,$data[$i]); } return $r; } $book_array = pc_fixed_width_unpack('A25title/A14author/A4publication_year', $books); ?>
Because the A
format to
unpack()
means “space-padded
string,” there’s no need to rtrim()
off the trailing spaces.
Once the fields have been parsed into $book_array
by either function, the data can
be printed as an HTML table, for example:
<?php $book_array = pc_fixed_width_unpack('A25title/A14author/A4publication_year', $books); print "<table>\n"; // print a header row print '<tr><td>'; print join('</td><td>',array_keys($book_array[0])); print "</td></tr>\n"; // print each data row foreach ($book_array as $row) { print '<tr><td>'; print join('</td><td>',array_values($row)); print "</td></tr>\n"; } print '</table>\n'; ?>
Joining data on </td><td>
produces a table row
that is missing its first <td>
and last </td>
. We produce a complete
table row by printing out <tr><td>
before the joined data
and </td></tr>
after
the joined data.
Both substr()
and
unpack()
have equivalent
capabilities when the fixed-width fields are strings, but unpack()
is the better solution when the
elements of the fields aren’t just strings.
If all of your fields are the same size, str_split()
is a handy shortcut for chopping up incoming data.
Available in PHP 5, it returns an array made up of sections of a
string. Example 1-38 uses str_split()
to break apart a string into
32-byte pieces.
See Also
For more information about unpack()
, see Recipe 1.16 and http://www.php.net/unpack; documentation on str_split()
at http://www.php.net/str_split; Recipe 4.8 discusses join()
.
1.14. Taking Strings Apart
Problem
You need to break a string into pieces. For example, you want to access
each line that a user enters in a <textarea>
form field.
Solution
Use explode()
if what separates the pieces is a constant
string:
<?php $words = explode(' ','My sentence is not very complicated'); ?>
Use split()
or preg_split()
if you need a POSIX or Perl-compatible regular expression to describe the separator:
<?php $words = split(' +','This sentence has some extra whitespace in it.'); $words = preg_split('/\d\. /','my day: 1. get up 2. get dressed 3. eat toast'); $lines = preg_split('/[\n\r]+/',$_REQUEST['textarea']); ?>
Use spliti()
or the /i
flag to
preg_split()
for
case-insensitive separator matching:
<?php $words = spliti(' x ','31 inches x 22 inches X 9 inches'); $words = preg_split('/ x /i','31 inches x 22 inches X 9 inches'); ?>
Discussion
The simplest solution of the bunch is explode()
. Pass it your separator string,
the string to be separated, and an optional limit on how many elements
should be returned:
<?php $dwarves = 'dopey,sleepy,happy,grumpy,sneezy,bashful,doc'; $dwarf_array = explode(',',$dwarves); ?>
This makes $dwarf_array
a
seven-element array, so print_r($dwarf_array)
prints:
Array ( [0] => dopey [1] => sleepy [2] => happy [3] => grumpy [4] => sneezy [5] => bashful [6] => doc )
If the specified limit is less than the number of possible chunks, the last chunk contains the remainder:
<?php $dwarf_array = explode(',',$dwarves,5); print_r($dwarf_array); ?>
This prints:
Array ( [0] => dopey [1] => sleepy [2] => happy [3] => grumpy [4] => sneezy,bashful,doc )
The separator is treated literally by explode()
. If you specify a comma and a
space as a separator, it breaks the string only on a comma followed by
a space, not on a comma or a space.
With split()
, you have
more flexibility. Instead of a string literal as a separator, it uses
a POSIX regular expression:
<?php $more_dwarves = 'cheeky,fatso, wonder boy, chunky,growly, groggy, winky'; $more_dwarf_array = split(', ?',$more_dwarves); ?>
This regular expression splits on a comma followed by an
optional space, which treats all the new dwarves properly. A dwarf
with a space in his name isn’t broken up, but everyone is broken apart
whether they are separated by “,” or “, ”. print_r($more_dwarf_array)
prints:
Array ( [0] => cheeky [1] => fatso [2] => wonder boy [3] => chunky [4] => growly [5] => groggy [6] => winky )
Similar to split()
is
preg_split()
, which uses a
Perl-compatible regular expression engine instead of a POSIX regular
expression engine. With preg_split()
, you can take advantage of
various Perl-ish regular expression extensions, as well as tricks such
as including the separator text in the returned array of
strings:
<?php $math = "3 + 2 / 7 - 9"; $stack = preg_split('/ *([+\-\/*]) */',$math,-1,PREG_SPLIT_DELIM_CAPTURE); print_r($stack); ?>
This prints:
Array ( [0] => 3 [1] => + [2] => 2 [3] => / [4] => 7 [5] => - [6] => 9 )
The separator regular expression looks for the four mathematical
operators (+
, -
, /
,
*
), surrounded by optional leading
or trailing spaces. The PREG_SPLIT_DELIM_CAPTURE
flag tells preg_split()
to include the matches as
part of the separator regular expression in parentheses in the
returned array of strings. Only the mathematical operator character
class is in parentheses, so the returned array doesn’t have any spaces
in it.
See Also
Regular expressions are discussed in more detail in Chapter 22; documentation on explode()
at http://www.php.net/explode, split()
at http://www.php.net/split, and preg_split()
at http://www.php.net/preg-split .
1.15. Wrapping Text at a Certain Line Length
Problem
You need to wrap lines in a string. For example, you want to display text
in <pre>
/</pre>
tags but
have it stay within a regularly sized browser window.
Solution
<?php $s = "Four score and seven years ago our fathers brought forth↵ on this continent a new nation, conceived in liberty and↵ dedicated to the proposition that all men are created equal."; print "<pre>\n".wordwrap($s)."\n</pre>"; ?>
This prints:
<pre> Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal. </pre>
Discussion
By default, wordwrap()
wraps text at 75 characters per line. An optional second argument
specifies different line length:
<?php print wordwrap($s,50); ?>
This prints:
Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.
Other characters besides \n
can be used for line breaks. For double spacing, use
"\n\n"
:
<?php print wordwrap($s,50,"\n\n"); ?>
This prints:
Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.
There is an optional fourth argument to wordwrap()
that controls the treatment of
words that are longer than the specified line length. If this argument
is 1, these words are wrapped. Otherwise, they span past the specified
line length:
<?php print wordwrap('jabberwocky',5); print wordwrap('jabberwocky',5,"\n",1); ?>
This prints:
jabberwocky jabbe rwock y
See Also
Documentation on wordwrap()
at http://www.php.net/wordwrap .
1.16. Storing Binary Data in Strings
Problem
You want to parse a string that contains values encoded as a binary structure or encode values into a string. For example, you want to store numbers in their binary representation instead of as sequences of ASCII characters.
Solution
Use pack()
to store binary data in a string:
<?php $packed = pack('S4',1974,106,28225,32725); ?>
Use unpack()
to extract binary data from a string:
<?php $nums = unpack('S4',$packed); ?>
Discussion
The first argument to pack()
is a format string that describes
how to encode the data that’s passed in the rest of the arguments. The
format string S4
tells pack()
to produce four unsigned short
16-bit numbers in machine byte order from its input data. Given 1974,
106, 28225, and 32725 as input on a little-endian machine, this
returns eight bytes: 182, 7, 106, 0, 65, 110, 213, and 127. Each
two-byte pair corresponds to one of the input numbers: 7 * 256 + 182
is 1974; 0 * 256 + 106 is 106; 110 * 256 + 65 = 28225; 127 * 256 + 213
= 32725.
The first argument to unpack()
is also a format string, and the
second argument is the data to decode. Passing a format string of
S4
, the eight-byte sequence that
pack()
produced returns a
four-element array of the original numbers. print_r($nums)
prints:
Array ( [1] => 1974 [2] => 106 [3] => 28225 [4] => 32725 )
In unpack()
, format
characters and their count can be followed by a string to be used as
an array key. For example:
<?php $nums = unpack('S4num',$packed); print_r($nums); ?>
This prints:
Array ( [num1] => 1974 [num2] => 106 [num3] => 28225 [num4] => 32725 )
Multiple format characters must be separated with /
in unpack()
:
<?php $nums = unpack('S1a/S1b/S1c/S1d',$packed); print_r($nums); ?>
This prints:
Array ( [a] => 1974 [b] => 106 [c] => 28225 [d] => 32725 )
The format characters that can be used with pack()
and unpack()
are listed in Table 1-2.
Format character | Data type |
| NUL-padded string |
| Space-padded string |
| Hex string, low nibble first |
| Hex string, high nibble first |
| |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| NUL byte |
| Back up one byte |
| NUL-fill to absolute position |
For a
, A
, h
, and
H
, a number after the format
character indicates how long the string is. For example, A25
means a 25-character space-padded
string. For other format characters, a following number means how many
of that type appear consecutively in a string. Use *
to take the rest of the available
data.
You can convert between data types with unpack()
. This example fills the array
$ascii
with the ASCII values of
each character in $s
:
<?php $s = 'platypus'; $ascii = unpack('c*',$s); print_r($ascii); ?>
This prints:
Array ( [1] => 112 [2] => 108 [3] => 97 [4] => 116 [5] => 121 [6] => 112 [7] => 117 [8] => 115 )
See Also
Documentation on pack()
at http://www.php.net/pack and unpack()
at http://www.php.net/unpack .
1.17. Program: Downloadable CSV File
Combining the header()
function to change the content type of what your PHP
program outputs with the fputcsv()
function
for data formatting lets you send CSV files to browsers that will be
automatically handed off to a spreadsheet program (or whatever
application is configured on a particular client system to handle CSV
files). Example 1-39 formats the results
of an SQL SELECT
query as CSV data
and provides the correct headers so that it is properly handled by the
browser.
<?php require_once 'DB.php'; // Connect to the database $db = DB::connect('mysql://david:hax0r@localhost/phpcookbook'); // Retrieve data from the database $sales_data = $db->getAll('SELECT region, start, end, amount FROM sales'); // Open filehandle for fputcsv() $output = fopen('php://output','w') or die("Can't open php://output"); $total = 0; // Tell browser to expect a CSV file header('Content-Type: application/csv'); header('Content-Disposition: attachment; filename="sales.csv"'); // Print header row fputcsv($output,array('Region','Start Date','End Date','Amount')); // Print each data row and increment $total foreach ($sales_data as $sales_line) { fputcsv($output, $sales_line); $total += $sales_line[3]; } // Print total row and close file handle fputcsv($output,array('All Regions','--','--',$total)); fclose($output) or die("Can't close php://output"); ?>
Example 1-39 sends two headers to
ensure that the browser handles the CSV output properly. The first
header, Content-Type
, tells the
browser that the output is not HTML, but CSV. The second header,
Content-Disposition
, tells the
browser not to display the output but to attempt to load an external
program to handle it. The filename
attribute of this header supplies a default filename for the browser to
use for the downloaded file.
If you want to provide different views of the same data, you can
combine the formatting code in one page and use a query string variable
to determine which kind of data formatting to do. In Example 1-40, the format
query string variable controls whether
the results of an SQL SELECT
query
are returned as an HTML table or CSV.
<?php $db = new PDO('sqlite:/usr/local/data/sales.db'); $query = $db->query('SELECT region, start, end, amount FROM sales', PDO::FETCH_NUM); $sales_data = $db->fetchAll(); $total = 0; $column_headers = array('Region','Start Date','End Date','Amount'); // Decide what format to use $format = $_GET['format'] == 'csv' ? 'csv' : 'html'; // Print format-appropriate beginning if ($format == 'csv') { $output = fopen('php://output','w') or die("Can't open php://output"); header('Content-Type: application/csv'); header('Content-Disposition: attachment; filename="sales.csv"'); fputcsv($output,$column_headers); } else { echo '<table><tr><th>'; echo implode('</th><th>', $column_headers); echo '</th></tr>'; } foreach ($sales_data as $sales_line) { // Print format-appropriate line if ($format == 'csv') { fputcsv($output, $sales_line); } else { echo '<tr><td>' . implode('</td><td>', $sales_line) . '</td></tr>'; } $total += $sales_line[3]; } $total_line = array('All Regions','--','--',$total); // Print format-appropriate footer if ($format == 'csv') { fputcsv($output,$total_line); fclose($output) or die("Can't close php://output"); } else { echo '<tr><td>' . implode('</td><td>', $total_line) . '</td></tr>'; echo '</table>'; } ?>
Accessing the program in Example 1-40 with format=csv
in the query string causes it to
return CSV-formatted output. Any other format
value in the query string causes it to
return HTML output. The logic that sets $format
to CSV or HTML could easily be
extended to other output formats like XML. If you have many places where
you want to offer for download the same data in multiple formats,
package the code in Example 1-40 into a
function that accepts an array of data and a format specifier and then
displays the right results.
Get PHP Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.