Chapter 13. Web Automation
Introduction
Most of the time, PHP is part of a web server, sending content to browsers. Even when you run it from the command line, it usually performs a task and then prints some output. PHP can also be useful, however, playing the role of a web client, retrieving URLs and then operating on the content. Most recipes in this chapter cover retrieving URLs and processing the results, although there are a few other tasks in here as well, such as cleaning up URLs and some JavaScript-related operations.
There are many ways retrieve a remote URL in PHP. Choosing one
method over another depends on your needs for simplicity, control, and
portability. The three methods discussed in this chapter are standard
file functions, the cURL extension, and the HTTP_Request class from PEAR. These three
methods can generally do everything you need and at least one of them
should be available to you whatever your server configuration or ability
to install custom extensions. Other ways to retrieve remote URLs include
the pecl_http extension (http://pecl.php.net/package/pecl_http), which, while
still in development, offers some promising features, and using the
fsockopen() function to open a
socket over which you send an HTTP request that you construct piece by
piece.
Using a standard file function such as file_get_contents() is simple and
convenient. It automatically follows redirects, so if you use this
function to retrieve the directory http://www.example.com/people and the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access