In addition to all the
built-in SQL functions, such as lower( )
and
upper( )
, you can extend SQLite to include
functions of your own written in PHP. These are known as
user-defined functions, or
UDFs for short. With a UDF, you embed logic
into SQLite and avoid doing it yourself in PHP. This way, you take
advantage of all the features inherent in a database, such as sorting
and finding distinct entries.
There are two types of UDFs: standard and aggregate. Standard UDFs are one-to-one: when given a single row of data, they return a single result. Functions that change case, calculate cryptographic hashes, and compute the sales tax on an item in a shopping cart are all standard functions. In contrast, aggregate functions are many-to-one: when using an aggregate function, SQLite passes it multiple rows and receives only a single value.
Although it is not a UDF, the most popular aggregate function is
count( )
, which returns the number of rows passed
to it. Besides count( )
, most aggregate functions
are related to statistics: finding the average, standard deviation,
or the maximum or minimum value of a set of data points.
UDFs are good for chopping up strings so you can perform nonstandard collations and groupings. For example, you want to sort through a list of URLs, maybe from a referrer log file, and create a list of unique hostnames sorted alphabetically. So, http://www.example.com/directory/index.html and http://www.example.com/page.html would both map to one entry: http://www.example.com.
To do this in PHP, you need to retrieve all the URLs, process them
inside your script, and then sort them. Plus, somewhere in all that,
you need to do the deduping. However, if it weren’t
for that pesky URL-conversion process, this could all be done in SQL
using the DISTINCT
and ORDER BY
keywords.
With a UDF like the one shown in Example 4-3, you foist all that hard work back onto SQLite where it belongs.
Example 4-3. Retrieving unique hostnames using an SQLite UDF
// CREATE table and INSERT URLs
$db = sqlite_open('/www/support/log.db');
$sql = 'CREATE TABLE access_log(url);';
$urls = array('http://www.example.com/directory/index.html',
'http://www.example.com/page.html');
foreach ($urls as $url) {
$sql .= "INSERT INTO access_log VALUES('$url');";
}
sqlite_query($db, $sql);
// UDF written in PHP
function url2host($url) {
$parts = parse_url($url);
return "$parts[scheme]://$parts[host]";
}
// Tell SQLite to associate PHP function url2host( ) with the
// SQL function host( ). Say that host( ) will only take 1 argument.
sqlite_create_function($db, 'host', 'url2host', 1);
// Do the query
$r = sqlite_query($db, 'SELECT DISTINCT host(lower(url)) AS clean_host
FROM access_log ORDER BY clean_host;');
// Loop through results
while ($row = sqlite_fetch_array($r)) {
print "$row[clean_host]\n";
}
http://www.example.com
To use a UDF, you first write a regular function in PHP. The
function’s arguments are what you want to pass in
during the SELECT
, and the function should
return
a single value. The url2host( )
function takes a URL; calls the built-in PHP function
parse_url( )
to
break the URL into its component parts; and returns a string
containing the scheme, ://
, and the host. So,
http://www.example.com/directory/index.html
gets
broken apart into many pieces. http
is stored into
$parts['scheme']
and
www.example.com
goes into
$parts['host']
.[3] This
creates a return value of http://www.example.com
.
The next step is to register url2host( )
with
SQLite using
sqlite_create_function( )
. This function takes four parameters: the database
handle, the name you want the function to be called inside SQLite,
the name of your function written in PHP, and the number of arguments
your function expects. The last parameter is optional, but if you
know for certain that your function accepts only a specific number of
parameters, providing this information helps SQLite optimize things
behind the scenes. In this example, the SQL function is
host( )
, while the PHP function is
url2host( )
. These names can be the same;
they’re different here to make the distinction
between them clear.
Now you can use host( )
inside any SQL calls using
that database connection. The SQL in Example 4-3
SELECT
s host(lower(url)) AS clean_host
. This takes the URL stored in the
url
column, converts it to lowercase, and calls
the UDF host( )
.
The function is not permanently registered with the database, and
goes away when you close the database. If you want to use it when you
reopen the database, you must reregister it. Also, the function is
registered only for that database; if you open up a new database
using sqlite_connect( )
, you need to call
sqlite_create_function( )
again.
The returned string is then named AS clean_host
;
this lets you refer to the results later on in the SQL query and also
access the value in PHP using that name. Since
you’re still in SQLite, you can take advantage of
this to sort the list using ORDER BY host
. This
sorts the results in alphabetical order, starting at
a
.
Now that’s cool, but it’s not
that cool. What is cool is
SQLite’s ability to call UDFs in the ORDER BY
clause. If you use the default alphabetical sort,
http://php.example.org and
http://www.example.org won’t be
near each other, because “p” and
“w” aren’t next to
each other in the alphabet. Yet both hosts are located under the
example.org
domain, so it makes sense that they
should be listed together. Not surprisingly, another UDF saves the
day.
function reverse_host($url) { list ($scheme, $host) = explode('://', $url); return join('.',array_reverse(explode('.',$host))); } sqlite_create_function($db, 'reverse', 'reverse_host', 1);
The reverse_host( )
function takes a URL and chops
it into two bits, the scheme and host, by explode( )
ing on ://
. You can do this because the
previous UDF, host( )
, has specifically created
strings in this manner. Next, $host
is passed
through a series of three functions that splits it up into its
component parts, reverses those parts, and then glues them back
together. This flips around the pieces of the host separated by
periods, but doesn’t actually reverse the text. So,
www.example.org
becomes
org.example.www
and not
gro.elpmaxe.www
or
www.elpmaxe.gro
.
This reversed hostname is perfect for sorting. When you alphabetize
org.example.www
, it nicely sits next to all its
brethren in the .org
top-level domain, then sorts
by the other hosts inside example.org
, and finally
orders the remaining subdomains. And that’s exactly
what you want.
You then register reverse_host( )
in the exact
same way you registered url2string( )
, using
sqlite_create_function( )
.
Once that’s done, you can call reverse( )
inside your SQL query:
$r = sqlite_query($db, 'SELECT DISTINCT host(lower(url)) AS clean_host FROM access_log ORDER BY reverse(clean_host);');
Given the following list of URLs as input:
http://www.example.com http://php.example.org http://www.example.org
you get the following as output:
http://www.example.com
http://php.example.org
http://www.example.org
The URL containing php.example.com
has filtered
down in the list below www.example.com
, even
though php
comes before www
in
the alphabet.
In contrast, Example 4-4 shows what you need to do to implement this in PHP without UDFs.
Example 4-4. Sorting unique hostnames without using SQLite UDFs
function url2host($url) { $parts = parse_url($url); return "$parts[scheme]://$parts[host]"; } function reverse_host($url) { list ($scheme, $host) = explode('://', $url); return join('.',array_reverse(explode('.',$host))); } function host_sort($a, $b) { $count_a = $GLOBALS['hosts'][$a]; $count_b = $GLOBALS['hosts'][$b]; if ($count_a < $count_b) { return 1; } if ($count_a > $count_b) { return -1; } return strcasecmp(reverse_host($a), reverse_host($b)); } $hosts = array( ); $r = sqlite_unbuffered_query($db, 'SELECT url FROM access_log'); while ($url = sqlite_fetch_single($r)) { $host = url2host($url); $hosts[$host]++ ; } uksort($hosts, 'host_sort');
This process breaks down into many steps:
Make a database query for
url
s.Retrieve
url
into$url
usingsqlite_fetch_single( )
.Convert
$url
into a host and store it in$host
.Place
$url
as a new element in the$hosts
array and increment that element by1
. This tracks the number of times each URL has appeared.Perform a user-defined key sort on the
$hosts
array.
The sqlite_fetch_single( )
function returns the
first (and in this case only) column from the result as a string.
This allows you to skip the step of saving the result as an array and
then extracting the element, either by using list
or as the 0
th index.
Doing $hosts[$host]++
is a old trick that allows
you to easily count the number of times each key appears in a list.
Since uksort( )
only passes array keys to the
sorting function, host_host( )
is not very
elegant, because it requires using a global variable to determine the
number of hits for each element.
Overall, compared to a UDF, this method requires more memory, execution time, and lines of code, because you’re replicating database functionality inside PHP.
As discussed earlier, most aggregate
functions are statistical functions, such as AVG( )
or STDEV( )
. People usually use
aggregate functions to return a single row from their query, but
that’s not a requirement. You can use them to link
together a set of related rows, to compact your query and return one
row per group.
This extension to the earlier referrer log sorting example shows how to use an aggregate function to provide the total number of hits per hostname, in addition to everything in the previous section:
SELECT DISTINCT host(lower(url)) AS clean_host, COUNT(*) AS hits FROM access_log GROUP BY clean_host ORDER BY hits DESC, reverse(clean_host)
The COUNT(*)
function sums the total number of
rows per host. However, this won’t work without
adding the GROUP BY host
clause.
GROUP
ing rows allows COUNT(*)
to know which sets of entries belong together. Whenever you have an
aggregate function—such as COUNT( )
,
SUM( )
, or any function that takes a set of rows
as input and returns only a single value as its output—use
GROUP BY
when you want your query to return
multiple rows. (If you’re just doing a basic
SELECT COUNT(*) FROM host
to find the total number
of rows in the table, there’s no need for any
GROUP
ing.)
COUNT(*)
is aliased to hits
,
which allows you to refer to it in the ORDER BY
clause. Then, to sort the results first by total hits, from most to
least, and then alphabetically within each total, use ORDER BY hits DESC
, reverse(host)
. By putting
hits
first, you prioritize it over
reverse(clean_host)
and the
DESC
keyword flips the sorting order to descending
(the default is ascending).
Using that query, this set of sites:
http://www.example.org http://www.example.org http://www.example.com http://php.example.org
and this PHP code:
while ($row = sqlite_fetch_array($r)) { print "$row[hits]: $row[clean_host]\n"; }
gives:
2: http://www.example.org
1: http://www.example.com
1: http://php.example.org
Furthermore, to restrict results to sites with more hits than a
specified amount, use a HAVING
clause:
SELECT DISTINCT host(lower(url)) AS clean_host, COUNT(*) AS hits FROM access_log GROUP BY clean_host HAVING hits > 1 ORDER BY hits DESC, reverse(clean_host)
You cannot use WHERE
here, because
WHERE
can only operate on data directly from a
table. Here the restriction hits > 1
compares
against the result of a GROUP BY
, so you need to
employ HAVING
instead.
You can define your own aggregate functions for SQLite in PHP. Unlike standard UDFs, you actually need to define two functions: one that’s called for each row and one that’s called after all the rows have been passed in.
The code in Example 4-5 shows how to create a basic SQLite user-defined aggregate function that calculates the average of a set of numbers.
Example 4-5. Averaging numbers using an SQLite aggregate function
// CREATE table and INSERT numbers
$db = sqlite_open('/www/support/data.db');
$sql = 'CREATE TABLE numbers(number);';
$numbers = array(1, 2, 3, 4, 5);
foreach ($numbers as $n) {
$sql .= "INSERT INTO numbers VALUES($n);";
}
sqlite_query($db, $sql);
// average_step( ) is called on each row.
function average_step(&$existing_data, $new_data) {
$existing_data['total'] += $new_data;
$existing_data['count']++;
}
// average_final( ) computes the average and returns it.
function average_final(&$existing_data) {
return $existing_data['total'] / $existing_data['count'];
}
sqlite_create_aggregate($db, 'average', 'average_step', 'average_final');
$r = sqlite_query($db, 'SELECT average(number) FROM numbers');
$average = sqlite_fetch_single($r);
print $average;
3
First, you define the two aggregate functions in PHP, just as you do
for regular UDFs. However, the first parameter for both functions is
a variable passed by reference that is used to keep track of the
UDF’s state. In this example, you need to track both
the running sum of the numbers and how many rows have contributed to
this total. That’s done in average_step( )
.
In average_final( )
, the final sum is divided by
the number of elements to find the average. This is the value
that’s returned by the function and passed back to
SQLite (and, eventually, to you).
To formally create an aggregate UDF, use
sqlite_create_aggregate( )
. It works like sqlite_create_function( )
, but you pass both PHP function names instead of just
one.
SQLite is not binary safe by default. Requiring PHP to automatically protect against problems caused by binary data causes a significant reduction in speed, so you must manually encode and decode data when it might be anything other than plain text. If your UDFs only operate on text, this isn’t a problem.
Inside a UDF, use
sqlite_udf_binary_decode( )
to convert data stored in SQLite into usable strings in
PHP:
function udf_function_encode($encoded_data) { $data = sqlite_udf_binary_decode($encoded_data); // rest of the function... }
When you’re finished, if the return value might also
be binary unsafe, re-encode it using
sqlite_udf_binary_encode( )
:
function udf_function_decode($encoded_data) { // rest of the function... return sqlite_udf_binary_encode($return_value); }
Get Upgrading to PHP 5 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.