Non-Interactive Downloads Using wget
Manually saving individual pages from a browser works fine when
you are only looking at a few. At some point you will want to automate
the process, especially when you want to archive an entire site.
wget
is the perfect tool for
automating these downloads, so I will spend a few pages describing how
it can be used.
wget
is a Unix command-line
tool for the non-interactive download of web pages. You can download from http://www.gnu.org/software/wget/, if your system does
not already have it installed. A binary for Microsoft Windows is also
available. It is a very flexible tool with a host of options listed in
its manual page.
Downloading a Single Page
Capturing a single web page with wget
is straightforward. Give it a URL, with
no other options, and it will download the page into the current
working directory with the same filename as that on the web
site:
% wget http://www.oreilly.com/index.html
--08:52:06-- http://www.oreilly.com/index.html
=> `index.html'
Resolving www.oreilly.com... 208.201.239.36, 208.201.239.37
Connecting to www.oreilly.com[208.201.239.36]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 54,774 [text/html]
100%[=====================================================>]
54,774 135.31K/s
08:52:07 (134.96 KB/s) - `index.html' saved [54774/54774]
Using the -nv
option
(non-verbose) suppresses most of these status messages, and the
-q
option silences it
completely.
Saving the file with the same name might be a problem ...
Get Internet Forensics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.