Chapter 13. Web Automation

Introduction

Most of the time, PHP is part of a web server, sending content to browsers. Even when you run it from the command line, it usually performs a task and then prints some output. PHP can also be useful, however, playing the role of a web client, retrieving URLs and then operating on the content. Most recipes in this chapter cover retrieving URLs and processing the results, although there are a few other tasks in here as well, such as cleaning up URLs and some JavaScript-related operations.

There are many ways retrieve a remote URL in PHP. Choosing one method over another depends on your needs for simplicity, control, and portability. The three methods discussed in this chapter are standard file functions, the cURL extension, and the HTTP_Request class from PEAR. These three methods can generally do everything you need and at least one of them should be available to you whatever your server configuration or ability to install custom extensions. Other ways to retrieve remote URLs include the pecl_http extension (http://pecl.php.net/package/pecl_http), which, while still in development, offers some promising features, and using the fsockopen() function to open a socket over which you send an HTTP request that you construct piece by piece.

Using a standard file function such as file_get_contents() is simple and convenient. It automatically follows redirects, so if you use this function to retrieve the directory http://www.example.com/people and ...

Get PHP Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.