O'Reilly logo

Linux Shell Scripting Cookbook by Sarath Lakshman

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Parsing data from a website

It is often useful to parse data from web pages by eliminating unnecessary details. sed and awk are the main tools that we will use for this task. You might have come across a list of access rankings in a grep recipe in the previous chapter Texting and driving; it was generated by parsing the website page http://www.johntorres.net/BoxOfficefemaleList.html.

Let's see how to parse the same data using text-processing tools.

How to do it...

Let's go through the command sequence used to parse details of actresses from the website:

$ lynx -dump http://www.johntorres.net/BoxOfficefemaleList.html  | \ grep -o "Rank-.*" | \
sed 's/Rank-//; s/\[[0-9]\+\]//' | \
sort -nk 1 |\
 awk ' 
{
 for(i=3;i<=NF;i++){ $2=$2" "$i } 
 printf "%-4s ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required