Chapter 6. Data Processing
In the previous chapter, you gathered lots of data. That data is likely in a variety of formats, including free-form text, comma-separated values (CSV), and XML. In this chapter, we show you how to parse and manipulate that data so you can extract key elements for analysis.
Commands in Use
We introduce awk
, join
, sed
, tail
, and tr
to prepare data for analysis.
awk
awk
is not just a command, but actually a programming language designed for processing text. Entire books are dedicated to this subject. awk
will be explained in more detail throughout this book, but here we provide a brief example of its usage.
Common command options
- -f
-
Read in the
awk
program from a specified file
Command example
Take a look at the file awkusers.txt in Example 6-1.
Example 6-1. awkusers.txt
Mike Jones John Smith Kathy Jones Jane Kennedy Tim Scott
You can use awk
to print each line where the user’s last name is Jones
.
$ awk '$2 == "Jones" {print $0}' awkusers.txt Mike Jones Kathy Jones
awk
will iterate through each line of the input file, reading in each word (separated by whitespace by default) into fields. Field $0
represents the entire line—$1
the first word, $2
the second word, etc.
An awk
program consists of patterns and corresponding code to be executed when that pattern is matched.
In this example, there is only one pattern. We test $2
to see if that field is equal to Jones
. If it is, awk
will run the code in the braces which, in this case, will print the ...
Get Cybersecurity Ops with bash now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.