Showing Data As a Quick and Easy Histogram

Problem

You need a quick screen-based histogram of some data.

Solution

Use the associative arrays of awk, as discussed in the previous recipe:

#
# cookbook filename: hist.awk
#
function max(arr, big)
{
    big = 0;
    for (i in user)
    {
        if (user[i] > big) { big=user[i];}
    }
    return big
}

NF > 7 {
    user[$3]++
}
END {
    # for scaling
    maxm = max(user);
    for (i in user) {
        #printf "%s owns %d files\n", i, user[i]
        scaled = 60 * user[i] / maxm ;
        printf "%-10.10s [%8d]:", i, user[i]
        for (i=0; i<scaled; i++) {
            printf "#";
          }
        printf "\n";
    }
}

When we run it with the same input as the previous recipe, we get:

$ ls -lR /usr/local |  awk -f hist.awk
bin       [      68]:#
albing    [    1801]:#######
root      [   13755]:##################################################
man       [   11491]:##########################################
$

Discussion

We could have put the code for max as the first code inside the END block, but we wanted to show you that you can define functions in awk. We are using a bit of fancier printf. The string format %-10.10s will left justify and pad to 10 characters but also truncate at 10 characters. The integer format %8d will assure that the integer is printed in an 8 character field. This gives each histogram the same starting point, by using the same amount of space regardless of the username or the size of the integer.

Like all arithmetic in awk, the scaling calculation is done with floating point unless we explicitly truncate the result with a call to the built-in int() function. ...

Get bash Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.