Every operating system includes small tools.
Small tools written in C perform specialized small tasks, such as reading and writing files, or filtering data. If you want to perform more complex tasks, you can even link several tools together. But how are these small tools built? In this chapter, youâll look at the building blocks of creating small tools. Youâll learn how to control command-line options, how to manage streams of information, and redirection, getting tooled up in no time.
A small tool is a C program that does one task and does it well. It might display the contents of a file on the screen or list the processes running on the computer. Or it might display the first 10 lines of a file or send it to the printer. Most operating systems come with a whole set of small tools that you can run from the command prompt or the terminal. Sometimes, when you have a big problem to solve, you can break it down into a series of small problems, and then write small tools for each of them.
A small tool does one task and does it well.
If one small part of your program needs to convert data from one format to another, thatâs the perfect kind of task for a small tool.
Take the GPS from the bike and download the data.
It creates a file called gpsdata.csv with one line of data for every location.
The geo2json tool needs to read the contents of the gpsdata.csv line by line...
...and then write that data in JSON format into a file called output.json.
The web page that contains the map application reads the output.json file.
It displays all of the locations on the map.
The problem is, instead of reading and writing files, your program is currently reading data from the keyboard and writing it to the display.
But that isnât good enough. The user wonât want to type in all of the data if itâs already stored in a file somewhere. And if the data in JSON format is just displayed on the screen, thereâs no way the map within the web page will be able to read it.
You need to make the program work with files. But how do you do that? If you want to use files instead of the keyboard and the display, what code will you have to change? Will you have to change any code at all?
Brain Power
Is there a way of making our program use files without changing code? Without even recompiling it?
Geek Bits
Tools that read data line by line, process it, and write it out again are called filters. If you have a Unix machine, or youâve installed Cygwin on Windows, you already have a few filter tools installed.
head: This tool displays the first few lines of a file.
tail: This filter displays the lines at the end of a file.
sed: The stream editor lets you do things like search and replace text.
Youâll see later how to combine filters together to form filter chains.
Youâre using scanf()
and printf()
to read from the
keyboard and write to the display. But the truth is, they donât talk
directly to the keyboard and display. Instead, they
use the Standard Input and Standard
Output. The Standard Input and
Standard Output are created by the operating system
when the program runs.
The operating system controls how data gets into and out of the Standard Input and Output. If you run a program from the command prompt or terminal, the operating system will send all of the keystrokes from the keyboard into the Standard Input. If the operating system reads any data from the Standard Output, by default it will send that data to the display.
The scanf()
and printf()
functions donât know, or care, where
the data comes from or goes to. They just read and write Standard Input
and the Standard Output.
Now this might sound like itâs kind of complicated. After all, why not just have your program talk directly to the keyboard and screen? Wouldnât that be simpler?
Well, thereâs a very good reason why operating systems communicate with programs using the Standard Input and the Standard Output:
You can redirect the Standard Input and Standard Output so that they read and write data somewhere else, such as to and from files.
Instead of entering data at the keyboard, you can use the
<
operator to read the data from a
file.
The <
operator tells the
operating system that the Standard Input of the program should be
connected to the gpsdata.csv file instead of the
keyboard. So you can send the program data from a file. Now you just
need to redirect its output.
To redirect the Standard Output to a file, you need to use
the >
operator:
Because youâve redirected the Standard Output, you donât see any data appearing on the screen at all. But the program has now created a file called output.json.
The output.json file is the one you needed to create for the mapping application. Letâs see if it works.
Your program seems to be able to read GPS data and format it correctly for the mapping application. But after a few days, a problem creeps in.
So what happened here? The problem is that there was some bad data in the GPS data file:
But the geo2json
program
doesnât do any checking of the data it reads; it just reformats the
numbers and sends them to the output.
That should be easy to fix. You need to validate the data.
Brain Power
Study the code. What do you think happened? Is the code doing what you asked it to? Why werenât there any error messages? Why did the mapping program think that the entire output.json file was corrupt?
Geek Bits
If your program finds a problem in the data, it exits with a status of 2. But how can you check that error status after the program has finished? Well, it depends on what operating system youâre using. If youâre running on a Mac, Linux, some other kind of Unix machine, or if youâre using Cygwin on a Windows machine, you can display the error status like this:
If youâre using the Command Prompt in Windows, then itâs a little different:
Both commands do the same thing: they display the number returned by the program when it finished.
The Standard Output is the default way of outputting data from a program. But what if something exceptional happens, like an error? Youâll probably want to deal with things like error messages a little differently from the usual output.
Thatâs why the Standard Error was invented. The Standard Error is a second output that was created for sending error messages.
Human beings generally have two ears and one mouth, but processes are wired a little differently. Every process has one ear (the Standard Input) and two mouths (the Standard Output and the Standard Error).
Human
Process
Letâs see how the operating system sets these up.
Remember how when a new process is created, the operating system points the Standard Input at the keyboard and the Standard Output at the screen? Well, the operating system creates the Standard Error at the same time and, like the Standard Output, the Standard Error is sent to the display by default.
That means that if someone redirects the Standard Input and Standard Output so they use files, the Standard Error will continue to send data to the display.
And thatâs really cool, because it means that even if the Standard Output is redirected somewhere else, by default, any messages sent down the Standard Error will still be visible on the screen.
So you can fix the problem of our hidden error messages by simply displaying them on the Standard Error.
But how do you do that?
Youâve already seen that the printf()
function sends data to the Standard
Output. What you didnât know is that the printf()
function is just a version of a more
general function called fprintf()
:
The fprintf()
function allows
you to choose where you want to send text to. You can tell fprintf()
to send text to stdout
(the
Standard Output) or stderr
(the Standard Error).
With just a couple of small changes, you can get our error messages printing on the Standard Error.
That means that the code should now work in exactly the same way, except the error messages should appear on the Standard Error instead of the Standard Output.
Letâs run the code and see.
One of the great things about small tools is their flexibility. If you write a program that does one thing really well, chances are you will be able to use it in lots of contexts. If you create a program that can search for text inside a file, say, then chances are youâre going to find that program useful in more than one place.
For example, think about your geo2json
tool. You created it to help display
cycling data, right? But thereâs no reason you canât use it for some
other purpose...like investigating...the...
To see how flexible our tool is, letâs use it for a completely different problem. Instead of just displaying data on a map, letâs try to use it for something a little more complex. Say you want to read in a whole set of GPS data like before, but instead of just displaying everything, letâs just display the information that falls inside the Bermuda Rectangle.
That means you will display only data that matches these conditions:
((latitude > 26) && (latitude < 34)) ((longitude > -76) && (longitude < -64))
So where do you need to begin?
Our geo2json
tool
displays all of the data itâs given. So what should we do? Should we
modify geo2json
so that it exports data and also
checks the data?
Well, we could, but remember, a small tool:
does one job and does it well
You donât really want to modify the geo2json
tool, because you want it to do just
one task. If you make the program do something more complex, youâll
cause problems for your users who expect the tool to keep working in
exactly the same way.
So if you donât want to change the geo2json tool, what should you do?
If you want to skip over the data that falls outside the Bermuda Rectangle, you should build a separate tool that does just that.
So, youâll have two tools: a
new bermuda
tool that filters out data
that is outside the Bermuda Rectangle, and then your original geo2json
tool that will convert the remaining
data for the map.
This is how youâll connect the programs together:
By splitting the problem down into two tasks, you will be able to
leave your geo2json
untouched. That
will mean that its current users will still be able to use it. The
question is:
How will you connect your two tools together?
Youâve already seen how to use redirection to connect the
Standard Input and the Standard
Output of a program file. But now youâll connect the
Standard Output of the bermuda
tool to the Standard Input of the
geo2json
, like this:
The | symbol is a pipe that connects the Standard Output of one process to the Standard Input of another process.
That way, whenever the bermuda
tool sees a piece of data inside the Bermuda Rectangle, it will send the
data to its Standard Output. The pipe will send that data from the
Standard Output of the bermuda
tool
to Standard Input of the geo2json
tool.
The operating system will handle the details of exactly how the pipe will do this. All you have to do to get things running is issue a command like this:
So now itâs time to build the bermuda
tool.
The bermuda
tool will work in a
very similar way to the geo2json
tool: it will read through a set of GPS data, line by line, and then
send data to the Standard Output.
But there will be two big differences. First, it wonât send
every piece of data to the Standard Output, just
the lines that are inside the Bermuda Rectangle. The second difference
is that the bermuda
tool will always
output data in the same CSV format used to store GPS data.
This is what the pseudocode for the tool looks like:
Letâs turn the pseudocode into C.
Do this!
You can download the spooky.csv file at http://oreillyhfc.appspot.com/spooky.csv. |
Weâve looked at how to read data from one file and write to another file using redirection, but what if the program needs to do something a little more complex, like send data to more than one file?
Imagine you need to create another tool that will read a set of data from a file, and then split it into other files.
So whatâs the problem? You canât write to files, right? Trouble is, with redirection you can write to only two files at most, one from the Standard Output and one from the Standard Error. So what do you do?
When a program runs, the operating system gives it three file data streams: the Standard Input, the Standard Output, and the Standard Error. But sometimes you need to create other data streams on the fly.
The good news is that the operating system doesnât limit you to the ones you are dealt when the program starts. You can roll your own as the program runs.
Each data stream is represented by a pointer to a file, and you
can create a new data stream using the fopen()
function:
The fopen()
function takes
two parameters: a
filename and a mode. The mode
can be w
to write to a file, r
to read
from a file, or a
to append data to the
end of a file.
Once youâve created a data stream, you can print to it using
fprintf()
, just like before. But
what if you need to read from a file? Well, thereâs also an fscanf()
function to help you do that too:
The mode is:
âwâ = write,
ârâ = read, or
âaâ = append.
fprintf(out_file, "Don't wear %s with %s", "red", "green"); fscanf(in_file, "%79[^\n]\n", sentence);
Finally, when youâre finished with a data stream, you need to close it. The truth is that all data streams are automatically closed when the program ends, but itâs still a good idea to always close the data stream yourself:
fclose(in_file); fclose(out_file);
Letâs try this out now.
If you compile and run the program with:
the program will read the spooky.csv file and split up the data, line by line, into three other filesâ ufos.csv, disappearances.csv, and other.csv.
Thatâs great, but what if a user wanted to split up the data differently? What if he wanted to search for different words or write to different files? Could he do that without needing to recompile the program each time?
The thing is, any program you write will need to give the
user the ability to change the way it works. If itâs a GUI program, you
will probably need to give it preferences. And if itâs a command-line
program, like our categorize
tool, it
will need to give the user the ability to pass it command-line arguments:
Like any array in C, you need some way of knowing how long the
array is. Thatâs why the main()
function has two parameters. The argc
value is a count of the number of elements in the array.
Command-line arguments really give your program a lot more flexibility, and itâs worth thinking about which things you want your users to tweak at runtime. It will make your program a lot more valuable to them.
OK, letâs see how you can add a little flexibility to the categorize program.
Watch it!
The first argument contains the name of the program as it was run by the user.
That means that the first proper command-line argument is
argv[1]
.
Safety Check
Although at Head First Labs we never make mistakes
(cough), itâs important in real-world programs to check for problems
when you open a file for reading or writing. Fortunately, if thereâs a
problem opening a data stream, the fopen()
function will return the value 0.
That means if you want to check for errors, you should change code
like:
FILE *in = fopen("i_dont_exist.txt", "r");
to this:
FILE *in; if (!(in = fopen("dont_exist.txt", "r"))) { fprintf(stderr, "Can't open the file.\n"); return 1; }
Chances are, any program you write is going to need options. If you create a chat program, itâs going to need preferences. If you write a game, the user will want to change the shape of the blood spots. And if youâre writing a command-line tool, you are probably going to need to add command-line options.
Command-line options are the little switches you often see with command-line tools:
Many programs use command-line options, so thereâs a
special library function you can use to make dealing with them a little
easier. Itâs called getopt()
, and each time you call
it, it returns the next option it finds on the command line.
Letâs see how it works. Imagine you have a program that can take a set of different options:
This program needs one option that will take a value ( -e
= engines) and another that is simply
on or off ( -a
= awesomeness). You can handle these
options by calling getopt()
in a loop
like this:
Inside the loop, you have a switch
statement to handle each of the valid
options. The string ae:
tells the getopt()
function that a
and e
are
valid options. The e
is followed by a
colon to tell getopt()
that the
-e
needs to be followed by an extra
argument. getopt()
will point to that
argument with the optarg
variable.
When the loop finishes, you tweak the argv
and argc
variables to skip past all of the options
and get to the main command-line arguments. That will make your argv
array look like this:
Youâve got Chapter 3 under your belt, and now youâve added small tools to your toolbox. For a complete list of tooltips in the book, see Appendix B.
Get Head First C now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.