Chapter 1. Combining Commands
When you work in Windows, macOS, and most other operating systems, you probably spend your time running applications like web browsers, word processors, spreadsheets, and games. A typical application is packed with features: everything that the designers thought their users would need. So, most applications are self-sufficient. They don’t rely on other apps. You might copy and paste between applications from time to time, but for the most part, they’re separate.
The Linux command line is different. Instead of big applications with
tons of features, Linux supplies thousands of small commands with very
few features. The command cat
, for example, prints files on the
screen and that’s about it. ls
lists the files in a directory, mv
renames files, and so on. Each command has a simple, fairly
well-defined purpose.
What if you need to do something more complicated? Don’t worry. Linux makes it easy to combine commands so their individual features work together to accomplish your goal. This way of working yields a very different mindset about computing. Instead of asking “Which app should I launch?” to achieve some result, the question becomes “Which commands should I combine?”
In this chapter, you’ll learn how to arrange and run commands in different combinations to do what you need. To keep things simple, I’ll introduce just six Linux commands and their most basic uses so you can focus on the more complex and interesting part—combining them—without a huge learning curve. It’s a bit like learning to cook with six ingredients, or learning carpentry with just a hammer and a saw. (I’ll add more commands to your Linux toolbox in Chapter 5.)
You’ll combine commands using pipes, a Linux feature that connects
the output of one command to the input of another. As I introduce
each command (wc
, head
, cut
, grep
, sort
, and uniq
), I’ll
immediately demonstrate its use with pipes. Some
examples will be
practical for daily Linux use, while others are just toy examples to
demonstrate an important feature.
Input, Output, and Pipes
Most Linux commands read input from the keyboard, write output to the screen, or both. Linux has fancy names for this reading and writing:
- stdin (pronounced “standard input” or “standard in”)
-
The stream of input that Linux reads from your keyboard. When you type any command at a prompt, you’re supplying data on stdin.
- stdout (pronounced “standard output” or “standard out”)
-
The stream of output that Linux writes to your display. When you run the
ls
command to print filenames, the results appear on stdout.
Now comes the cool part. You can connect the stdout of one command to
the stdin of another, so the first command feeds the second. Let’s
begin with the familiar ls -l
command to list a large directory,
such as /bin, in long format:
$ ls -l /bin total 12104 -rwxr-xr-x 1 root root 1113504 Jun 6 2019 bash -rwxr-xr-x 1 root root 170456 Sep 21 2019 bsd-csh -rwxr-xr-x 1 root root 34888 Jul 4 2019 bunzip2 -rwxr-xr-x 1 root root 2062296 Sep 18 2020 busybox -rwxr-xr-x 1 root root 34888 Jul 4 2019 bzcat ⋮ -rwxr-xr-x 1 root root 5047 Apr 27 2017 znew
This directory contains far more files than your display has lines, so
the output quickly scrolls off-screen. It’s a shame that ls
can’t
print the information one screenful at a time, pausing until you press
a key to continue. But wait: another Linux command has that
feature. The less
command displays a file one screenful at a time:
$ less myfile View the file; press q to quit
You can connect these two commands because ls
writes to stdout and
less
can read from stdin. Use a pipe to send the output of ls
to
the input of less
:
$ ls -l /bin | less
This combined command displays the directory’s contents one screenful
at a time. The vertical bar
(|
)
between the commands is the Linux
pipe symbol.1 It connects the
first command’s stdout to the next command’s stdin. Any command line
containing pipes is called a pipeline.
Commands generally are not aware that they’re part of a pipeline. ls
believes it’s writing to the display, when in fact its output has been
redirected to less
. And less
believes it’s reading from the keyboard
when it’s actually reading the output of ls
.
Six Commands to Get You Started
Pipes are an essential part of Linux expertise. Let’s dive into building your piping skills with a small set of Linux commands so no matter which ones you encounter later, you’re ready to combine them.
The six commands—wc
, head
, cut
, grep
, sort
, and uniq
—have
numerous options and modes of operation that I’ll largely skip
for now to focus on pipes. To learn more about any command,
run the man
command to display full documentation. For
example:
$ man wc
To demonstrate our six commands in action, I’ll use a file named animals.txt that lists some O’Reilly book information, shown in Example 1-1.
Example 1-1. Inside the file animals.txt
python Programming Python 2010 Lutz, Mark snail SSH, The Secure Shell 2005 Barrett, Daniel alpaca Intermediate Perl 2012 Schwartz, Randal robin MySQL High Availability 2014 Bell, Charles horse Linux in a Nutshell 2009 Siever, Ellen donkey Cisco IOS in a Nutshell 2005 Boney, James oryx Writing Word Macros 1999 Roman, Steven
Each line contains four facts about an O’Reilly book, separated by a single tab character: the animal on the front cover, the book title, the year of publication, and the name of the first author.
Command #1: wc
The wc
command prints the number of lines, words, and characters in
a file:
$ wc animals.txt 7 51 325 animals.txt
wc
reports that the file animals.txt has 7 lines, 51 words, and
325 characters. If you count the characters by eye, including spaces
and tabs, you’ll find only 318 characters, but wc
also includes
the invisible newline character that ends each line.
The options -l
, -w
, and -c
instruct wc
to print only the
number of lines, words, and characters, respectively:
$ wc -l animals.txt 7 animals.txt $ wc -w animals.txt 51 animals.txt $ wc -c animals.txt 325 animals.txt
Counting is such a useful, general-purpose task that the authors of
wc
designed the command to work with pipes. It reads from stdin if
you omit the filename, and it writes to stdout. Let’s use ls
to
list the contents of the current directory and pipe them to wc
to
count lines. This pipeline answers the question, “How many files are
visible in my current directory?”
$ ls -1 animals.txt myfile myfile2 test.py $ ls -1 | wc -l 4
The option -1
, which tells ls
to print its results in a single
column, is not strictly necessary here. To learn why I used it, see
the sidebar “ls Changes Its Behavior When Redirected”.
wc
is the first command you’ve seen in this chapter, so you’re a bit
limited in what you can do with pipes. Just for fun, pipe the output
of wc
to itself, demonstrating that the same command can appear more
than once in a pipeline. This combined command reports that the number
of words in the output of wc
is four: three integers and a
filename:
$ wc animals.txt 7 51 325 animals.txt $ wc animals.txt | wc -w 4
Why stop there? Add a third wc
to the pipeline and count lines,
words, and characters in the output “4”:
$ wc animals.txt | wc -w | wc 1 1 2
The output indicates one line (containing the number 4), one word (the number 4 itself), and two characters. Why two? Because the line “4” ends with an invisible newline character.
That’s enough silly pipelines with wc
. As you gain more commands,
the pipelines will become more practical.
Command #2: head
The head
command prints the first lines of a file.
Print the first three lines of
animals.txt with head
using the option -n
:
$ head -n3 animals.txt python Programming Python 2010 Lutz, Mark snail SSH, The Secure Shell 2005 Barrett, Daniel alpaca Intermediate Perl 2012 Schwartz, Randal
If you request more lines than the file contains, head
prints the
whole file (like cat
does). If you omit the -n
option, head
defaults
to 10 lines (-n10
).
By itself, head
is handy for peeking at the top of a file when you
don’t care about the rest of the contents. It’s a speedy and
efficient command, even for very large files, because it needn’t read
the whole file. In addition, head
writes
to stdout, making it useful in pipelines. Count the number of words in
the first three lines of animals.txt:
$ head -n3 animals.txt | wc -w 20
head
can also read from stdin for more pipeline fun. A common use is
to reduce the output from another command when you don’t care to see
all of it, like a long directory listing. For example, list the first five
filenames in the /bin directory:
$ ls /bin | head -n5 bash bsd-csh bunzip2 busybox bzcat
Command #3: cut
The cut
command prints one or more columns from a file. For example,
print all book titles from animals.txt, which appear in the second
column:
$ cut -f2 animals.txt Programming Python SSH, The Secure Shell Intermediate Perl MySQL High Availability Linux in a Nutshell Cisco IOS in a Nutshell Writing Word Macros
cut
provides two ways to define what a “column” is. The first is
to cut by field (-f
), when the input consists of strings
(fields) each separated by a single tab character. Conveniently,
that is exactly the format of the file animals.txt. The preceding
cut
command prints the second field of each line, thanks to the
option -f2
.
To shorten the output, pipe it to head
to print only the first three
lines:
$ cut -f2 animals.txt | head -n3 Programming Python SSH, The Secure Shell Intermediate Perl
You can also cut multiple fields, either by separating their field numbers with commas:
$ cut -f1,3 animals.txt | head -n3 python 2010 snail 2005 alpaca 2012
or by numeric range:
$ cut -f2-4 animals.txt | head -n3 Programming Python 2010 Lutz, Mark SSH, The Secure Shell 2005 Barrett, Daniel Intermediate Perl 2012 Schwartz, Randal
The second way to define a “column” for cut
is by character
position, using the -c
option. Print the first three characters from
each line of the file, which you can specify either with commas (1,2,3
) or
as a range (1-3
):
$ cut -c1-3 animals.txt pyt sna alp rob hor don ory
Now that you’ve seen the basic functionality, try something more
practical with cut
and pipes. Imagine that the animals.txt file is
thousands of lines long, and you need to extract just the authors’
last names. First, isolate the fourth field, author name:
$ cut -f4 animals.txt Lutz, Mark Barrett, Daniel Schwartz, Randal ⋮
Then pipe the results to cut
again, using the option -d
(meaning
“delimiter”) to change the separator character to a
comma instead of a tab, to isolate the authors’ last names:
$ cut -f4 animals.txt | cut -d, -f1 Lutz Barrett Schwartz ⋮
Save Time with Command History and Editing
Are you retyping a lot of commands? Press the up arrow key instead, repeatedly, to scroll through commands you’ve run before. (This shell feature is called command history.) When you reach the desired command, press Enter to run it immediately, or edit it first using the left and right arrow keys to position the cursor and the Backspace key to delete. (This feature is command-line editing.)
I’ll discuss much more powerful features for command history and editing in Chapter 3.
Command #4: grep
grep
is an extremely powerful command, but for now I’ll hide most
of its capabilities and say it prints lines that match a given
string. (More detail will come in Chapter 5.) For example, the
following command displays lines from animals.txt that
contain the string Nutshell
:
$ grep Nutshell animals.txt horse Linux in a Nutshell 2009 Siever, Ellen donkey Cisco IOS in a Nutshell 2005 Boney, James
You can also print lines that don’t match a given string, with the
-v
option. Notice the lines containing “Nutshell” are absent:
$ grep -v Nutshell animals.txt python Programming Python 2010 Lutz, Mark snail SSH, The Secure Shell 2005 Barrett, Daniel alpaca Intermediate Perl 2012 Schwartz, Randal robin MySQL High Availability 2014 Bell, Charles oryx Writing Word Macros 1999 Roman, Steven
In general, grep
is useful for finding text in a
collection of files. The following command prints lines that contain the
string Perl
in files with names ending in .txt:
$ grep Perl *.txt animals.txt:alpaca Intermediate Perl 2012 Schwartz, Randal essay.txt:really love the Perl programming language, which is essay.txt:languages such as Perl, Python, PHP, and Ruby
In this case, grep
found three matching lines, one in animals.txt
and two in essay.txt.
grep
reads stdin and writes stdout, making it great for pipelines.
Suppose you want to know how many subdirectories are in the large
directory /usr/lib. There is no single Linux command to provide that
answer, so construct a pipeline. Begin with the ls -l
command:
$ ls -l /usr/lib drwxrwxr-x 12 root root 4096 Mar 1 2020 4kstogram drwxr-xr-x 3 root root 4096 Nov 30 2020 GraphicsMagick-1.4 drwxr-xr-x 4 root root 4096 Mar 19 2020 NetworkManager -rw-r--r-- 1 root root 35568 Dec 1 2017 attica_kde.so -rwxr-xr-x 1 root root 684 May 5 2018 cnf-update-db ⋮
Notice that ls -l
marks directories with a d
at the beginning of
the line. Use cut
to isolate the first column, which may or may not
be a d
:
$ ls -l /usr/lib | cut -c1 d d d - - ⋮
Then use grep
to keep only the lines containing d
:
$ ls -l /usr/lib | cut -c1 | grep d d d d ⋮
Finally, count lines with wc
, and you have your answer, produced
by a four-command pipeline—/usr/lib contains 145 subdirectories:
$ ls -l /usr/lib | cut -c1 | grep d | wc -l 145
Command #5: sort
The sort
command reorders the lines of a file into ascending order
(the default):
$ sort animals.txt alpaca Intermediate Perl 2012 Schwartz, Randal donkey Cisco IOS in a Nutshell 2005 Boney, James horse Linux in a Nutshell 2009 Siever, Ellen oryx Writing Word Macros 1999 Roman, Steven python Programming Python 2010 Lutz, Mark robin MySQL High Availability 2014 Bell, Charles snail SSH, The Secure Shell 2005 Barrett, Daniel
or descending order (with the -r
option):
$ sort -r animals.txt snail SSH, The Secure Shell 2005 Barrett, Daniel robin MySQL High Availability 2014 Bell, Charles python Programming Python 2010 Lutz, Mark oryx Writing Word Macros 1999 Roman, Steven horse Linux in a Nutshell 2009 Siever, Ellen donkey Cisco IOS in a Nutshell 2005 Boney, James alpaca Intermediate Perl 2012 Schwartz, Randal
sort
can order the lines alphabetically (the default) or numerically
(with the -n
option). I’ll demonstrate this with pipelines that cut
the third field in animals.txt, the year of publication:
$ cut -f3 animals.txt Unsorted 2010 2005 2012 2014 2009 2005 1999 $ cut -f3 animals.txt | sort -n Ascending 1999 2005 2005 2009 2010 2012 2014 $ cut -f3 animals.txt | sort -nr Descending 2014 2012 2010 2009 2005 2005 1999
To learn the year of the most recent book in animals.txt, pipe the
output of sort
to the input of head
and print just the first line:
$ cut -f3 animals.txt | sort -nr | head -n1 2014
Maximum and Minimum Values
sort
and head
are powerful partners when working
with numeric data, one value per line. You can print the maximum
value by piping the data to:
... | sort -nr | head -n1
and print the minimum value with:
... | sort -n | head -n1
As another example, let’s play with the file /etc/passwd, which lists the users that can run processes on the system.4 You’ll generate a list of all users in alphabetical order. Peeking at the first five lines, you see something like this:
$ head -n5 /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin smith:x:1000:1000:Aisha Smith,,,:/home/smith:/bin/bash jones:x:1001:1001:Bilbo Jones,,,:/home/jones:/bin/bash
Each line consists of strings separated by colons, and the first
string is the username, so you can isolate the usernames with the
cut
command:
$ head -n5 /etc/passwd | cut -d: -f1 root daemon bin smith jones
and sort them:
$ head -n5 /etc/passwd | cut -d: -f1 | sort bin daemon jones root smith
To produce the sorted list of all usernames, not just the first five,
replace head
with cat
:
$ cat /etc/passwd | cut -d: -f1 | sort
To detect if a given user has an account on your system, match their
username with grep
. Empty output means no account:
$ cut -d: -f1 /etc/passwd | grep -w jones jones $ cut -d: -f1 /etc/passwd | grep -w rutabaga (produces no output)
The -w
option instructs grep
to match full words only, not partial
words, in case your system also has a username that contains “jones”,
such as sallyjones2
.
Command #6: uniq
The uniq
command detects repeated, adjacent lines in a file. By
default, it removes the repeats. I’ll demonstrate this with a simple
file containing capital letters:
$ cat letters A A A B B A C C C C $ uniq letters A B A C
Notice that uniq
reduced the first three A
lines to a single A
, but
it left the last A
in place because it wasn’t adjacent to the
first three.
You can also count occurrences with the -c
option:
$ uniq -c letters 3 A 2 B 1 A 4 C
I’ll admit, when I first encountered the uniq
command, I didn’t see
much use in it, but it quickly became one of my favorites. Suppose you
have a tab-separated file of students’ final grades for a university
course, ranging from A
(best) to F
(worst):
$ cat grades C Geraldine B Carmine A Kayla A Sophia B Haresh C Liam B Elijah B Emma A Olivia D Noah F Ava
You’d like to print the grade with the most occurrences. (If there’s a
tie, print just one of the winners.) Begin by isolating the grades
with cut
and sorting them:
$ cut -f1 grades | sort A A A B B B B C C D F
Next, use uniq
to count adjacent lines:
$ cut -f1 grades | sort | uniq -c 3 A 4 B 2 C 1 D 1 F
Then sort the lines in reverse order, numerically, to move the most frequently occurring grade to the top line:
$ cut -f1 grades | sort | uniq -c | sort -nr 4 B 3 A 2 C 1 F 1 D
and keep just the first line with head
:
$ cut -f1 grades | sort | uniq -c | sort -nr | head -n1 4 B
Finally, since you want just the letter grade, not the count, isolate
the grade with cut
:
$ cut -f1 grades | sort | uniq -c | sort -nr | head -n1 | cut -c9 B
and there’s your answer, thanks to a six-command pipeline—our longest yet. This sort of step-by-step pipeline construction is not just an educational exercise. It’s how Linux experts actually work. Chapter 8 is devoted to this technique.
Detecting Duplicate Files
Let’s combine what you’ve learned with a larger example. Suppose you’re in a directory full of JPEG files and you want to know if any are duplicates:
$ ls image001.jpg image005.jpg image009.jpg image013.jpg image017.jpg image002.jpg image006.jpg image010.jpg image014.jpg image018.jpg ⋮
You can answer this question with a pipeline. You’ll need another
command, md5sum
, which examines a file’s contents and computes a
32-character string called a
checksum:
$ md5sum image001.jpg 146b163929b6533f02e91bdf21cb9563 image001.jpg
A given file’s checksum, for mathematical reasons, is very, very
likely to be unique. If two files have the same checksum, therefore,
they are almost certainly duplicates. Here, md5sum
indicates the
first and third files are duplicates:
$ md5sum image001.jpg image002.jpg image003.jpg 146b163929b6533f02e91bdf21cb9563 image001.jpg 63da88b3ddde0843c94269638dfa6958 image002.jpg 146b163929b6533f02e91bdf21cb9563 image003.jpg
Duplicate checksums are easy to detect by eye when there are only
three files, but what if you have three thousand? It’s pipes to the
rescue. Compute all the checksums, use cut
to isolate the first 32
characters of each line, and sort the lines to make any duplicates
adjacent:
$ md5sum *.jpg | cut -c1-32 | sort 1258012d57050ef6005739d0e6f6a257 146b163929b6533f02e91bdf21cb9563 146b163929b6533f02e91bdf21cb9563 17f339ed03733f402f74cf386209aeb3 ⋮
Now add uniq
to count repeated lines:
$ md5sum *.jpg | cut -c1-32 | sort | uniq -c 1 1258012d57050ef6005739d0e6f6a257 2 146b163929b6533f02e91bdf21cb9563 1 17f339ed03733f402f74cf386209aeb3 ⋮
If there are no duplicates, all of the counts produced by uniq
will
be 1. Sort the results numerically from high to low, and any counts
greater than 1 will appear at the top of the output:
$ md5sum *.jpg | cut -c1-32 | sort | uniq -c | sort -nr 3 f6464ed766daca87ba407aede21c8fcc 2 c7978522c58425f6af3f095ef1de1cd5 2 146b163929b6533f02e91bdf21cb9563 1 d8ad913044a51408ec1ed8a204ea9502 ⋮
Now let’s remove the nonduplicates. Their checksums are preceded by
six spaces, the number one, and a single space. We’ll use grep -v
to
remove these lines:5
$ md5sum *.jpg | cut -c1-32 | sort | uniq -c | sort -nr | grep -v " 1 " 3 f6464ed766daca87ba407aede21c8fcc 2 c7978522c58425f6af3f095ef1de1cd5 2 146b163929b6533f02e91bdf21cb9563
Finally, you have your list of duplicate checksums, sorted by the number of occurrences, produced by a beautiful six-command pipeline. If it produces no output, there are no duplicate files.
This command would be even more useful if it displayed the filenames
of the duplicates, but that operation requires features we haven’t
discussed yet. (You’ll learn them in “Improving the duplicate file detector”.) For now,
identify the files having a given checksum by searching with grep
:
$ md5sum *.jpg | grep 146b163929b6533f02e91bdf21cb9563 146b163929b6533f02e91bdf21cb9563 image001.jpg 146b163929b6533f02e91bdf21cb9563 image003.jpg
and cleaning up the output with cut
:
$ md5sum *.jpg | grep 146b163929b6533f02e91bdf21cb9563 | cut -c35- image001.jpg image003.jpg
Summary
You’ve now seen the power of stdin, stdout, and pipes. They turn a small handful of commands into a collection of composable tools, proving that the whole is greater than the sum of the parts. Any command that reads stdin or writes stdout can participate in pipelines.6 As you learn more commands, you can apply the general concepts from this chapter to forge your own powerful combinations.
1 On US keyboards, the pipe symbol is on the same key as the backslash (\
), usually located between the Enter and Backspace keys or between the left Shift key and Z.
2 The POSIX standard calls this form of command a utility.
3 Depending on your setup, ls
may also use other formatting features, such as color, when printing to the screen but not when redirected.
4 Some Linux systems store the user information elsewhere.
5 Technically, you don’t need the final sort -nr
in this pipeline to isolate duplicates because grep
removes all the nonduplicates.
6 Some commands do not use stdin/stdout and therefore cannot read from pipes or write to pipes. Examples are mv
and rm
. Pipelines may incorporate these commands in other ways, however; you’ll see examples in Chapter 8.
Get Efficient Linux at the Command Line now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.