O'Reilly logo

Computer Science & Perl Programming by Jon Orwant

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 56. Spidering an FTP Site

Gerard Lanois

This article is the result of my own personal adventures in maintaining a rapidly growing web site via FTP, without the benefit of a telnet shell on my server. If you have FTP access to your web server’s file tree, there are four reasons why mirroring with FTP instead of HTTP might be a better choice:

  1. Your ISP’s web server munges links and image paths in your HTML pages, so you can’t use HTTP to mirror the site.

  2. There is a cache between your HTTP client and your web server, making you retrieve out-of-date pages.

  3. Your web site contains dynamically generated content.

  4. You have data besides HTML pages and images, such as Perl programs.

This article demonstrates how to recursively traverse an FTP site using the Net::FTP module bundled with Perl and available on CPAN. For the pedantically inclined, further background information regarding the FTP protocol is available in RFC 959 (http://www.yahoo.com/Computers_and_Internet/Standards/RFCs/).

Motivation

You may find yourself in the unenviable position of trying to maintain a remote file tree without shell access to the system where your file tree resides. Your file tree might contain a web site, an FTP site, or other data.

Many ISPs do not provide shell accounts, either for security reasons or because the host operating system has no concept of a remote login shell (such as Windows, or old versions of Mac OS). If you take the login shell out of the equation and wish to automate the process of moving ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required