Chapter 7. Data Analysis

In the previous chapters, we used scripts to collect data and prepare it for analysis. Now we need to make sense of it all. When analyzing large amounts of data, it often helps to start broad and continually narrow the search as new insights are gained into the data.

In this chapter, we use the data from web server logs as input into our scripts. This is simply for demonstration purposes. The scripts and techniques can easily be modified to work with nearly any type of data.

Commands in Use

We introduce sort, head, and uniq to limit the data we need to process and display. The file in Example 7-1 will be used for command examples.

Example 7-1. file1.txt
12/05/2017 192.168.10.14 test.html
12/30/2017 192.168.10.185 login.html

sort

The sort command is used to rearrange a text file into numerical and alphabetical order. By default, sort will arrange lines in ascending order, starting with numbers and then letters. Uppercase letters will be placed before their corresponding lowercase letters unless otherwise specified.

Common command options

-r

Sort in descending order.

-f

Ignore case.

-n

Use numerical ordering, so that 1, 2, 3 all sort before 10. (In the default alphabetic sorting, 2 and 3 would appear after 10.)

-k

Sort based on a subset of the data (key) in a line. Fields are delimited by whitespace.

-o

Write output to a specified file.

Command example

To sort file1.txt by the filename column and ignore the IP address column, you would ...

Get Cybersecurity Ops with bash now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.