May 2013
Beginner to intermediate
384 pages
7h 40m
English
It is often useful to parse data from web pages by eliminating unnecessary details. sed and awk are the main tools that we will use for this task. You might have come across a list of actress rankings in a grep recipe in the Chapter 4, Texting and driving; it was generated by parsing the website page http://www.johntorres.net/BoxOfficefemaleList.html.
Let us see how we can parse the same data by using text-processing tools.
Let's go through the commands used to parse details of actresses from the website:
$ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html | \ grep -o "Rank-.*" | \ sed -e 's/ *Rank-\([0-9]*\) *\(.*\)/\1\t\2/' | \ sort -nk 1 > actresslist.txt
The output will be as follows: ...
Read now
Unlock full access