Chapter 6. Data Processing

In the previous chapter, you gathered lots of data. That data is likely in a variety of formats, including free-form text, comma-separated values (CSV), and XML. In this chapter, we show you how to parse and manipulate that data so you can extract key elements for analysis.

Commands in Use

We introduce awk, join, sed, tail, and tr to prepare data for analysis.

awk

awk is not just a command, but actually a programming language designed for processing text. Entire books are dedicated to this subject. awk will be explained in more detail throughout this book, but here we provide a brief example of its usage.

Common command options

-f

Read in the awk program from a specified file

Command example

Take a look at the file awkusers.txt in Example 6-1.

Example 6-1. awkusers.txt
Mike Jones
John Smith
Kathy Jones
Jane Kennedy
Tim Scott

You can use awk to print each line where the user’s last name is Jones.

$ awk '$2 == "Jones" {print $0}' awkusers.txt

Mike Jones
Kathy Jones

awk will iterate through each line of the input file, reading in each word (separated by whitespace by default) into fields. Field $0 represents the entire line—$1 the first word, $2 the second word, etc. An awk program consists of patterns and corresponding code to be executed when that pattern is matched. In this example, there is only one pattern. We test $2 to see if that field is equal to Jones. If it is, awk will run the code in the braces which, in this case, will print the ...

Get Cybersecurity Ops with bash now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.