May 2017
Beginner
552 pages
28h 47m
English
The -dump option downloads a web page as pure ASCII. The next recipe shows how to send that ASCII version of the page to a file:
$ lynx URL -dump > webpage_as_text.txt
This command will list all the hyperlinks (<a href="link">) separately under a heading References, as the footer of the text output. This lets us parse links separately with regular expressions.
Consider this example:
$lynx -dump http://google.com > plain_text_page.txt
You can see the plain text version of text using the cat command:
$ cat plain_text_page.txt
Search [1]Images [2]Maps [3]Play [4]YouTube [5]News [6]Gmail
[7]Drive
[8]More »
[9]Web History | [10]Settings | [11]Sign in
[12]St. Patrick's Day 2017
_______________________________________________________ ...