Chapter 4. Bits and (Many) Bytes
Before we start building more complex programs with things like functions
in Chapter 5, we should cover two more useful storage categories in C:
arrays and individual bits. These arenât really distinct types like
int
or double
, but they are useful when
dealing with tiny things or with lots of things. Indeed, the notion of an
array, a sequential list of items, is so useful we had to cheat back in
âGetting User Inputâ and use it without much explanation
to store user input in the form of
a string.
We have also discussed the idea of Boolean values that are either yes or no,
true or false, 1 or 0. When dealing with microcontrollers in particular, you
will regularly have a small collection of sensors or switches that are providing
on/off values. Câs normal storage options would mean devoting an entire
char
(8 bits) or int
(16 bits) to keeping track of such tiny values. That feels like a bit (ha!) of a waste, and it is. C has a few tricks you can employ to store this type of information more efficiently. In this
chapter, weâll tackle both the big stuff by declaring arrays and then accessing and manipulating their contents, as well as how to work with the smallest bits (ahem). (And I promise not to make more bit puns. Mostly.)
Storing Multiple Things with Arrays
It is almost impossible to find a C program tackling real-world problems that does not use arrays. If you have to work with any collection of values of any type at all, those values will almost certainly wind up in an array. A list of grades, a list of students, the list of US state abbreviations, etc., etc., etc. Even our tiny machines can use arrays to track the colors on a strip of LEDs. It is not an exaggeration to say arrays are ubiquitous in C, so letâs take a closer look at how to use them.
Creating and Manipulating Arrays
As I mentioned, we used an array back in Chapter 2 (in âGetting User Inputâ) to allow for some user input. Letâs revisit that code (ch04/hello2.c) and pay more attention to the array of characters:
#include <stdio.h>
int
main
()
{
char
name
[
20
];
printf
(
"Enter your name: "
);
scanf
(
"%s"
,
name
);
printf
(
"Well hello, %s!
\n
"
,
name
);
}
So what exactly does that char name[20]
declaration do? It creates a variable
named ânameâ with a base type of char
, but it is an array, so you
get space to store multiple char
s. In this case, we asked for 20 bytes, as
illustrated in Figure 4-1.
And what happens with this array variable when we run the program?
When you type in a name and hit Return on your keyboard, the characters
you typed get placed in the array. Since we used scanf()
and its string (%s
)
format field, we will automatically get a trailing null character ('\0'
or
sometimes '\000'
) that marks the end of the string. In memory,
the name
variable now looks like Figure 4-2.
Note
The null character at the end of the array is
a peculiarity of strings; it is not how other types of arrays are managed.
Strings are often stored in arrays that are set up before the length
of the string is known, and use this '\0'
sentinel much like we
did in âThe while Statementâ to mark the end of useful input.
All string-processing functions in C expect to see this terminating character,
and you can count on its existence in your own work with strings.
Now when we use the name
variable again in the subsequent printf()
call, we
can echo back all of the letters that were stored and the null character
tells printf()
when to stop, even if the name doesnât occupy the entire
array. Conversely, printing a string that does not have the
terminating character will cause printf()
to keep going after the end of
the array and likely cause a crash.
Length versus capacity
Didnât we allocate 20 character slots? What are they doing if our name (such as âGraceâ) doesnât occupy all of the slots? Happily, that final, null character solves this quandary rather neatly. We do indeed have room for longer names like âAlexanderâ or even âGrace Hopperâ; the null character always marks the end, no matter how big the array is.
Warning
If you havenât worked with characters before in C or another language,
the notion of a null character can be confusing. It is the character with the
numeric value of 0 (zero). That is not the same thing as a space character
(ASCII 32) or the digit 0 (ASCII 48) or a newline ('\n'
ASCII 10).
You usually donât have to worry
about adding or placing these nulls by hand, but it is important to remember
they occur at the end of strings, even though they are never printed.
But what if the name was too long for the allocated array? Letâs find out! Run the program again and type in a longer name:
ch04$ ./a.out Enter your name: @AdmiralGraceMurrayHopper Well hello, @AdmiralGraceMurrayHopper! *** stack smashing detected ***: terminated Aborted (core dumped)
Interesting. So the capacity we declared is a fairly hard limitâthings go wrong if we overflow an array.1 Good to know! We always need to reserve sufficent space before we use it.2
What if we didnât know ahead of time how many slots were
in an array? The C sizeof
operator can help. It can tell you (in bytes) the
size of variables or types. For simple types, that is the length of an int
or char
or
double
. For arrays, it is the total memory allocated. That means we can tell
how many slots we have in an array as long as we know its base type. Letâs
try making an array of double
values, say, for an accounting ledger. Weâll
pretend we donât know how many values we can store and use sizeof
to find
out. Take a look at ch04/capacity.c:
#include <stdio.h>
int
main
()
{
double
ledger
[
100
];
printf
(
"Size of a double: %li
\n
"
,
sizeof
(
double
));
printf
(
"Size of ledger: %li
\n
"
,
sizeof
ledger
);
printf
(
"Calculated ledger capacity: %li
\n
"
,
sizeof
ledger
/
(
sizeof
(
double
)));
}
Notice that when asking about the size of a type, you need parentheses. The compiler
needs this extra bit of context to treat the keyword as an expression. For variables
like ledger
that already fit the expression definition, we can leave them off.
Letâs run our tiny program. Hereâs the output:
ch04$ gcc capacity.c ch04$ ./a.out Size of a double: 8 Size of ledger: 800 Calculated ledger capacity: 100
Nice. Since we actually do know how big we made our array, we can just compare
that chosen size to our calculated results. They match. (Whew!) But there are
situations where you are given information from an independent source and wonât
always know the size of the array. Remember
that tools like sizeof
exist and can help you understand that information.
Initializing arrays
So far, weâve created empty arrays or loaded char
arrays with input
from the user at runtime. Just like simpler variable types, C allows you to
initialize arrays when you define them.
For any array, you can supply a list of values inside a pair of curly braces, separated by commas. Here are a few examples:
int
days_in_month
[
12
]
=
{
31
,
28
,
31
,
30
,
31
,
30
,
31
,
31
,
30
,
31
,
30
,
31
};
char
vowels
[
6
]
=
{
'a'
,
'e'
,
'i'
,
'o'
,
'u'
,
'y'
};
float
readings
[
7
]
=
{
8.9
,
8.6
,
8.5
,
8.7
,
8.9
,
8.8
,
8.5
};
Notice that the declared size of the array matches the number of values supplied to initialize the array. In this situation, C allows a nice shorthand: you can omit the explicit size in between the square brackets. The compiler will allocate the correct amount of memory to fit the initialization list exactly. This means we could rewrite our previous snippet like this:
int
days_in_month
[]
=
{
31
,
28
,
31
,
30
,
31
,
30
,
31
,
31
,
30
,
31
,
30
,
31
};
char
vowels
[]
=
{
'a'
,
'e'
,
'i'
,
'o'
,
'u'
,
'y'
};
float
readings
[]
=
{
8.9
,
8.6
,
8.5
,
8.7
,
8.9
,
8.8
,
8.5
};
Strings, however, are a special case. C supports the notion of string literals. This means you can use a sequence of characters between
double quotes as a value. You can use a string literal to initialize a char[]
variable. You can also use it almost anywhere a string
variable would be allowed. (We saw this in âThe Ternary Operator and Conditional Assignmentâ where
we used the terneray operator (?:
) to print true and false values as words instead
of as 1 or 0.)
// Special initialization of a char array with a string literal
char
secret
[]
=
"password1"
;
// The printf() format string is usually a string literal
printf
(
"Hello, world!
\n
"
);
// And we can print literals, too
printf
(
"The value stored in %s is '%s'
\n
"
,
"secret"
,
secret
);
You can also initialize a string by supplying individual characters inside curly braces, but that is generally harder to read. You have to remember to include the terminating null character, and this verbose option doesnât provide any other real advantage over the use of a string literal.
Accessing array elements
Once you have an array created, you can access individual elements inside the array using square brackets. You give an index number inside the square brackets, where the first element has an index value of 0. To print the second vowel or the days in July from our earlier arrays, for example:
printf
(
"The second vowel is: %c
\n
"
,
vowels
[
1
]);
printf
(
"July has %d days.
\n
"
,
days_in_month
[
6
]);
These statements would produce the following output if bundled into a complete program:
The second vowel is: e July has 31 days.
But the value we supply inside the square brackets does not need to be
a fixed number. It can be any expression that results in an integer.
(If you have enough memory, it could be a long
or other, larger integer
type.) This means you can use a calculation or a variable as your index.
For example, if we store the âcurrent monthâ in a variable
and use the typical values for monthsâJanuary is 1, February is 2,
and so onâthen we could print the number of days in July using
the following code:
int
month
=
7
;
printf
(
"July (month %d) has %d days."
,
month
,
days_in_month
[
month
-
1
]);
The ease and flexibility of accessing these members is part of what makes arrays so popular. After a bit of practice, youâll find them indispensible!
Warning
The value inside the square brackets needs to be âin boundsâ or youâll get a an error at runtime. For example if you tried printing the days in the 15th month like we tried for July, youâd see something like âInvalid (month 15) has -1574633234 days.â C wonât stop youânote we did not cause a crashâbut neither did we get a usable value. And assigning values (which we discuss next) to invalid slots in an array is how you cause a buffer overflow. This classic security exploit gets its name from the notion of an array as a storage buffer. You âoverflowâ it exactly by assigning values to the array outside the actual array. If you get lucky (or are very devious), you can write executable code and trick the computer into running your commands instead of the intended program.
Changing array elements
You can also change the value of a given array position using the square bracket notation. For example, we could alter the number of days in February to accommodate a leap year:
if
(
year
%
4
==
0
)
{
// Forgive the naive leap year calculation :)
days_in_month
[
1
]
=
29
;
}
This type of post-declaration assignment is handy (or often even necessary) when you have more dynamic data. With the Arduino projects weâll cover later, for example, you might want to keep the 10 most recent sensor readings. You wonât have those readings when you declare your array. So you can set aside 10 slots, and just fill them in later:
float
readings
[
10
];
// ... interesting stuff goes here to set up the sensor and read it
readings
[
7
]
=
latest_reading
;
Just make sure you supply a value of the same type as (or at least compatible with)
the array. Our readings
array, for example, is expecting floating point numbers. If we were to assign
a character to one of the slots, it would âfitâ in that slot, but
it would produce a strange answer. Assigning the letter x
to readings[8]
would end up putting the ASCII value of lowercase x (120) in the slot as a float
value of 120.0.
Iterating through arrays
The ability to use a variable as an index makes working with an entire array
a simple loop task. We could print out all the days_in_month
counts using
a for
loop, for example:
for
(
int
m
=
0
;
m
<
12
;
m
++
)
{
// remember the array starts at 0, but humans start at 1
printf
(
"Days in month %d is %d.
\n
"
,
m
+
1
,
days_in_month
[
m
]);
}
This snippet produces the following output. We can get a sense of just how powerful the combination of arrays and loops could be. With just a tiny bit of code, we get some fairly interesting output:
Days in month 1 is 31. Days in month 2 is 28. Days in month 3 is 31. Days in month 4 is 30. Days in month 5 is 31. Days in month 6 is 30. Days in month 7 is 31. Days in month 8 is 31. Days in month 9 is 30. Days in month 10 is 31. Days in month 11 is 30. Days in month 12 is 31.
Youâre free to use the elements of your array however you need to.
You arenât limited to printing them out. As another example, we could
calculate the average reading from our readings
array like so:
float
readings
[]
=
{
8.9
,
8.6
,
8.5
,
8.7
,
8.9
,
8.8
,
8.5
};
// Use our sizeof trick to get the number of elements
int
count
=
sizeof
readings
/
sizeof
(
float
);
float
total
=
0.0
;
float
average
;
for
(
int
r
=
0
;
r
<
count
;
r
++
)
{
total
+=
readings
[
r
];
}
average
=
total
/
count
;
printf
(
"The average reading is %0.2f
\n
"
,
average
);
This example highlights just how much C you have learned in only a few
chapters! If you want some more practice, build this snippet into a
complete program. Compile and run it to make sure you have it working.
(The average should be 8.70, by the way.) Then add some more variables
to capture the highest and lowest readings. Youâll need some if
statements to help there. You can see one possible solution in arrays.c
in the examples for this chapter.
Review of Strings
I have noted that strings are really just arrays of type char
with some
extra features supported by the language itself, such as literals. But since
strings represent the easiest way to communicate with users, I want to
highlight more of what you can do with strings in C.
Initializing strings
We have already seen how to declare and initialize a string. If you know
the value of the string ahead of time, you can use a literal. If you
donât know the value, you can still declare the variable and then
use scanf()
to ask the user what text to store. But what if you wanted
to do both? Assign an initial default and then let the user supply an
optional new value that overrides the default?
Happily, you can get there, but you do have to plan ahead a little. It might be tempting to use the default value when you first declare your variable, and then let the user provide a different value at runtime if they want. This works, but it requires an extra question to the user (âDo you want to change the background color, yes or no?â) and also assumes the user will supply a valid value as an alternative. Such assumptions are often safe as you are likely the only user while youâre learning a new language. But in programs you share with others, itâs better not to assume what the user will do.
String literals also make it tempting to think you can simply overwrite
an existing string just like you can with int
or float
variables.
But a string really is just a char[]
, and arrays are not assignable
beyond the optional initialization when you declare them.
These limitations can all be overcome with the use of things like functions, which weâll explore in Chapter 5. In fact, the need for the functions that make it possible to manipulate strings at runtime are so useful, they have been bundled up into their own library, which I cover in âstdlib.hâ.
For now, I want you to remember that string literals can make the initialization of character arrays simple and readable, but that at their heart, strings in C are not like numbers and individual characters.
Accessing individual characters
But I do want to reiterate that strings are just arrays. You can access individual characters in your string using the same syntax you use to access the members of any other array. For example, we could find out if a given phrase contains a comma by looking at each character in the phrase. Hereâs ch04/comma.c:
#include <stdio.h>
int
main
()
{
char
phrase
[]
=
"Hello, world!"
;
int
i
=
0
;
// keep looping until the end of the string
while
(
phrase
[
i
]
!=
'\0'
)
{
if
(
phrase
[
i
]
==
','
)
{
printf
(
"Found a comma at position %d.
\n
"
,
i
);
break
;
}
// try the next character
i
++
;
}
if
(
phrase
[
i
]
==
'\0'
)
{
// Rats. Made it to the end of the string without a match.
printf
(
"No comma found in %s
\n
"
,
phrase
);
}
}
This program actually uses the array nature of the string a few times. Our
loop condition depends on accessing a single character of the string just
like the if
condition that helps answer our original question. And we test
an individual character at the very end to see if we found something or not.
Weâll look at several string-related functions in Chapter 7,
but hopefully you see how you could accomplish things like copying or
comparing strings using a good loop and the square brackets to march one
character at a time through your array.
Multidimensional Arrays
It may not be obvious since strings are already an array, but you can store
an array of strings in C. But because there is no âstring typeâ
that you can use when declaring such an array, how do you do it? Turns out
C supports the idea of a multidimensional array so you can create an
array of char[]
just like other arrays:
char
month_names
[][];
Seems fair. But what is not obvious in that declaration is what the pair of square bracket pairs refer to. When declaring a two-dimensional array like this, the first square bracket pair can be thought of as the row index, and the second is the column. Another way to think about it is the first index tells you how many character arrays weâll be storing and the second index tells you how long each of those arrays can be.
We know how many months there are and a little research tells us the longest
name is September, with nine letters. Add on one more for our terminating
null character, and we could precisely define our month_names
array like
this:
char
month_names
[
12
][
11
];
You could also initialize this two-dimensional array since we know the names of the months and donât require user input:
char
month_names
[
12
][
11
]
=
{
"January"
,
"February"
,
"March"
,
"April"
,
"May"
,
"June"
,
"July"
,
"August"
,
"September"
,
"October"
,
"November"
,
"December"
};
But here I cheated a little with the initialization by using string literals,
so the second dimension of the month_names
array isnât readily apparent.
The first dimension is the months, and the second (hidden) dimension is the individual
characters that make up the month names. If you are working with other data types
that donât have this string literal shortcut,
you can use nested curly brace lists like this:
int
multiplication
[
5
][
5
]
=
{
{
0
,
0
,
0
,
0
,
0
},
{
0
,
1
,
2
,
3
,
4
},
{
0
,
2
,
4
,
6
,
8
},
{
0
,
3
,
6
,
9
,
12
},
{
0
,
4
,
8
,
12
,
16
}
};
It might be tempting to assume the compiler can determine the size of the multi-dimensional structure, but sadly, you must supply the capacity for each dimension beyond the first. For our month names, for example, we could start off without the â12â for how many names, but not without the â11â indicating the maximum length of any individual name:
// This shortcut is ok
char
month_names
[][
11
]
=
{
"January"
,
"February"
/* ... */
};
// This shortcut is NOT
char
month_names
[][]
=
{
"January"
,
"February"
/* ... */
};
Youâll eventually internalize these rules, but the compiler (and many editors) will always be there to catch you if you make a small mistake.
Accessing Elements in Multidimensional Arrays
With our array of month names, it is straightforward getting access to any particular month. It looks just like accessing the element of any other one-dimensional array:
printf
(
"The name of the first month is: %s
\n
"
,
month_names
[
0
]);
// Output: The name of the first month is: January
But how would we access an element in the multiplication
two-dimensional
array? We use two indices:
printf
(
"Looking up 3 x 4: %d
\n
"
,
multiplication
[
3
][
4
]);
// Output: Looking up 3 x 4: 12
Notice that in this multiplication table, the potentially strange use of zero as the first index value turns out to be a useful element. Index â0â gives us a rowâor columnâof valid multiplication answers.
And with two indices, youâll need two loops if you want to print out all of the data. We can take the work we did in âNested Loops and Tablesâ and use it to access our stored values rather than generating the numbers directly. Hereâs the printing snippet from ch04/print2d.c:
for
(
int
row
=
0
;
row
<
5
;
row
++
)
{
for
(
int
col
=
0
;
col
<
5
;
col
++
)
{
printf
(
"%3d"
,
multiplication
[
row
][
col
]);
}
printf
(
"
\n
"
);
}
And here is our nicely formatted table:
ch04$ gcc print2d.c ch04$ ./a.out 0 0 0 0 0 0 1 2 3 4 0 2 4 6 8 0 3 6 9 12 0 4 8 12 16
Weâll see some other options in Chapter 6 for more tailored multidimensional storage. In the near term, just remember that you can create more dimensions with more pairs of square brackets. While youâll likely use one-dimensional arrays most of the time, tables are common enough and spatial data often fits in three-dimensional âcubes.â Few programmers will ever need it, especially those of us concentrating on microcontrollers, but C does support higher orders of arrays.
Storing Bits
Arrays allow us to store truly vast quantities of data with relative ease. At the other end of the spectrum, C has several operators that you can use to manipulate very small amounts of data. Indeed, you can work with the absolute smallest pieces of data: individual bits.
When C was developed in the 1970s, every byte of memory was expensive, and
therefore precious. As I noted at the beginning of the chapter, if you had
a particular variable that stored Boolean
answers, using 16 bits for an int
or even just 8 bits for a char
would
be a little wasteful. If you had an array of such variables, it could become
very wasteful. Desktop computers these days can manage that type of waste
without blinking an eye (or an LED), but our microcontrollers often need
all the storage help they can get.
Binary, Octal, Hexadecimal
Before we tackle the operators in C that access and manipulate bits, letâs
review some notation for discussing binary values. If we have a single bit,
a 0 or a 1 are sufficient and thatâs easy enough. However, if we want to
store a dozen bits inside one int
variable, we need a way to describe the
value of that int
. Technically, the int
will have a decimal (base 10)
representation, but base 10 does not map cleanly to individual bits. For
that, octal and hexadecimal notation is much clearer. (Binary, or base 2,
notation would obviously be clearest, but large numbers get very long
in binary. Octal and hexadecimalâoften just âhexââare
a good compromise.)
When we talk about numbers, we often implicitly use base 10, thanks to the digits (ooh, get it?) on our hands. Computers donât have hands (discounting robots, of course) and donât count in base 10. They use binary. Two digits, 0 and 1, make up the entirety of their world. If you group three binary digits, you can represent the decimal numbers 0 through 7, which is eight total numbers, so this is base 8, or octal. Add a fourth bit and you can represent 0 through 15, which covers the individual âdigitsâ in hexadecimal. Table 4-1 shows these first 16 values in all four bases.
Decimal | Binary | Octal | Hexadecimal | Decimal | Binary | Octal | Hexadecimal | |
---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You might notice that I always showed eight numbers for the binary column, three for octal, and two for hex. The byte (8 bits) is a very common unit to work with in C. Binary numbers often get shown in groups of four, with as many groups as required to cover the largest number being discussed. So for a full byte of 8 bits, which can store any value between 0 to 255, for example, you would see a binary value with two groupings of four digits. Similarly, octal values with three digits can display any byteâs value, and hexadecimal numbers need two digits. Note also that hexadecimal literals are not case sensitive. (Neither is the âxâ in the hexadecimal prefix, but an uppercase âXâ can be harder to distinguish.)
Weâll be using binary notation from time to time when working with microcontrollers in the latter half of this book, but you may have already run into hexadecimal numbers if you have written any styled text in HTML or CSS or similar markup languages. Colors in these documents are often represented with the hex values for a byte of red, a byte of green, a byte of blue, and occasionally a byte of alpha (transparency). So a full red that ignores the alpha channel would be FF0000
. Now that you know two hex digits can represent one byte, it may be easier to read such color values.
To help you get accustomed to these different bases, try filling out the missing values in Table 4-2. (You can check your answers with the Table 4-4 table at the end of the chapter.) The numbers are not in any particular order, by the way. I want to keep you on your toes!
Decimal | Binary | Octal | Hexadecimal |
---|---|---|---|
|
|
||
|
|||
|
|
||
|
|
||
|
|||
|
|||
|
|||
|
Modern browsers can convert bases for you right in the search bar, so you probably wonât need to memorize the full 256 values possible in a byte. But it will still be useful if you can estimate the size of a hex value or determine if an octal ASCII code is probably a letter or a number.
Octal and Hexadecimal Literals in C
The C language has special options for expressing numeric literals in octal
and hex. Octal literals start with a simple 0 as a prefix, although you can
have multiple zeroes if you are keeping all of your values the same width, like
we did in our base tables. For hex values, you use the prefix 0x
or 0X
. You
typically match the case of the âXâ character to the case of
any of the A-F
digits in your hex value, but this is just a convention.
Hereâs a snippet showing how to use some of these prefixes:
int
line_feed
=
012
;
int
carriage_return
=
015
;
int
red
=
0xff
;
int
blue
=
0x7f
;
Some compilers support nonstandard prefixes or suffixes for representing binary literals, but as the ânonstandardâ qualifier suggests, they are not part of the official C language.
Input and Output of Octal and Hex Values
The printf()
function has built-in format specifiers to help you produce octal
or hexadecimal output. Octal value can be printed with the %o
specifier and
hex can be shown with either %x
or %X
, depending on whether you want lower- or
uppercase output. These specifiers can be used with variables or expressions of
any of the integer types in any base, which makes printf()
a pretty easy way
to convert from decimal to octal or hex. We could easily produce a table similar to
Table 4-1 (minus the binary column) using a loop
and a single printf()
. We can take advantage of the width and padding options
of the format specifier to get our desired three octal digits and two hex digits.
Take a look at ch04/dec_oct_hex.c:
#include <stdio.h>
int
main
()
{
printf
(
" Dec Oct Hex
\n
"
);
for
(
int
i
=
0
;
i
<
16
;
i
++
)
{
printf
(
" %3d %03o 0x%02X
\n
"
,
i
,
i
,
i
);
}
}
Notice that we just reuse the exact same variable for each of the three
columns. Also notice that when printing the hexadecimal version, I manually
added the â0xâ prefixâit is not included in the %x
or %X
formats. Here are a few of the first and last lines:
ch04$ gcc dec_oct_hex.c ch04$ ./a.out Dec Oct Hex 0 000 0x00 1 001 0x01 2 002 0x02 3 003 0x03 ... 13 015 0x0D 14 016 0x0E 15 017 0x0F
Neat. Just the output we wanted. On the input side using scanf()
, the format specifiers work in an interesting way. They are all still used to get numeric input from the user. The different specifiers now perform base conversion on the number you enter. If you specify decimal input (%d
), you cannot use hex values. Conversely, if you specify hex input (%x
or %X
) and only enter numbers (i.e., you donât use any of the A-F
digits), the number will still be
converted from base 16.
Note
The specifiers %d
and %i
are normally interchangeable. In a printf()
call, they
will result in identical output. In a scanf()
call, however, the %d
option
requires you to enter a simple base 10 number. The %i
specifier allows you to use
the various C literal perfixes to enter a value in a different base such as 0x
to
enter a hexadecimal number.
We can illustrate this with a simple converter program,
ch04/rosetta.c,
that will translate different inputs to all three bases on output. We can set which
type of input we expect in the program but use an if/else if/else
block to make
it easy to adjust. (Although recompiling will still be required.)
#include <stdio.h>
int
main
()
{
char
base
;
int
input
;
printf
(
"Convert from? (d)ecimal, (o)ctal, he(x): "
);
scanf
(
"%c"
,
&
base
);
if
(
base
==
'o'
)
{
// Get octal input
printf
(
"Please enter a number in octal: "
);
scanf
(
"%o"
,
&
input
);
}
else
if
(
base
==
'x'
)
{
// Get hex input
printf
(
"Please enter a number in hexadecimal: "
);
scanf
(
"%x"
,
&
input
);
}
else
{
// assume decimal input
printf
(
"Please enter a number in decimal: "
);
scanf
(
"%d"
,
&
input
);
}
printf
(
"Dec: %d, Oct: %o, Hex: %x
\n
"
,
input
,
input
,
input
);
}
Here are a few example runs:
ch04$ gcc rosetta.c ch04$ ./a.out Convert from? (d)ecimal, (o)ctal, he(x): d Please enter a number in decimal: 55 Dec: 55, Oct: 67, Hex: 37 ch04$ ./a.out Convert from? (d)ecimal, (o)ctal, he(x): x Please enter a number in hexadecimal: 37 Dec: 55, Oct: 67, Hex: 37 ch04$ ./a.out Convert from? (d)ecimal, (o)ctal, he(x): d Please enter a number in decimal: 0x37 Dec: 0, Oct: 0, Hex: 0
Interesting. The first two runs went according to plan. The third run didnât create
an error but didnât really work, either. What happened here is a sort of
âfeatureâ of scanf()
. It tried very hard to bring in a decimal
number. It found the character 0 in our input, which is a valid decimal digit, so it started
parsing that character. But it next encountered the x character which is not
valid for a base 10 number. So that was the end of the parsing and our program
converted the value 0 into each of the three bases.
Try running this program yourself and switch the mode a few times. Do you get the behavior you expect? Can you cause any errors?
Knowing what we do about the difference between %i
and other numeric specifiers
in scanf()
, can you see how to make this program a little simpler? It should be
possible to accept any of the three bases for input without the big if
statement.
Iâll leave this problem to you as an exercise, but you can see one possible
solution in the rosetta2.c file in the code examples for this chapter.
Bitwise Operators
Starting out on limited hardware like C did means occasionally working with data
at the bit level quite apart from printing or reading in binary data. C supports
this work with bitwise operators. These operators allow you to tweak individual
bits inside int
variables (or char
or long
, of course). Weâll see
some fun uses of these features with the Arduino microcontroller in Chapter 10.
Table 4-3 describes these operators and shows some examples that make use of the following two variables:
char
a
=
0xD
;
// 1101 in binary
char
b
=
0x7
;
// 0111 in binary
Operator | Name | Description | Example |
---|---|---|---|
& |
bitwise and |
Both bits must be 1 to yield a 1 |
a & b == 0101 |
| |
bitwise or |
Either bit can be 1 to yield a 1 |
a | b == 1111 |
! |
bitwise not |
Yields the opposite of the input bit |
~a == 0010 |
^ |
bitwise xor |
eXclusive OR, bits that donât match yield a 1 |
a ^ b == 1010 |
<< |
left shift |
Move bits to the left by a number of places |
a << 3 == 0110 1000 |
>> |
right shift |
Move bits to the right by a number of places |
b >> 2 == 0001 |
You can technically apply bitwise operators to any variable type to tweak
particular bits. They are rarely used on floating point types, though. You
usually pick an integral type that is big enough to hold however many individual
bits you need. Because they are âeditingâ the bits of a given
variable, you often see them used with compound assignment operators (op=
).
If you have five LEDs, for example, you could keep track of
their on/off state with a single char
type variable, as in this snippet:
char
leds
=
0
;
// Start with everyone off, 0000 0000
leds
|=
8
;
// Turn on the 4th led from the right, 0000 1000
leds
^=
0x1f
;
// Toggle all lights, 0001 0111
leds
&=
0x0f
;
// Turn off 5th led, leave others as is, 0000 0111
Five int
or char
values likely wonât make the difference
in whether you can store or run a program on a microcontroller, even ones
with only one or two kilobytes of memory, but
those small storage needs do add up. If youâre tracking a panel of
LEDs with hundreds or thousands of lights, it makes a difference how
tightly you can store their state. One size rarely fits all, so
remember your options and pick one that balances between ease of use
and any resource constraints you have.
Mixing Bits and Bytes
We now have enough elements of C under our belts to start writing some
really interesting code. We can combine all of our previous discussions on
bits, arrays, types, looping, and branching to tackle a popular way of
encoding binary data in text. One format for transmitting binary data
through networks of devices with potentially limited resources is to
convert it to simple lines of text. This is known as âbase64â
encoding and is still used in things like inline email attachments for
images. The 64 comes from the fact that this encoding uses 6-bit chunks,
and 2 to the 6th power is 64. We use numbers, lowercase letters, uppercase
letters, and other characters more or less arbitrarily chosen, typically
the plus (+
) and the forward slash (/
).3
For this encoding, values 0 through 25 are the uppercase letters A through Z. Values 26 through 51 are the lowercase letters a through z. Values 52 through 61 are the digits 0 through 9, and finally, value 62 is the plus sign, and 63 is the forward slash.
But arenât bytes 8 bits long? Yes, they are. Thatâs exactly where all of our recent topics come into play! We can use this new knowledge to change those 8-bit chunks into 6-bit chunks.
Figure 4-3 shows a small example of converting three bytes into a string of base64 text. These happen to be the first few bytes of a valid JPEG file, but you could work on any source you like. This is a fairly trivial bit of binary data, of course, but it will validate our algorithm.
We have nine bytes total to encode in our example,
but really we just want to take things three bytes at a time, like the
illustration, and repeat. Sounds like a job for a loop! We could use any of our
loops, but weâll go with a for
loop since we know where to start and
end, and we can count up by threes. Weâll pull out three bytes from
the source array into three variables, just for convenience of discussion.
unsigned
char
source
[
9
]
=
{
0xd8
,
0xff
,
0xe0
,
0xff
,
0x10
,
0x00
,
0x46
,
0x4a
,
0x46
};
char
buffer
[
4
]
=
{
0
,
0
,
0
,
0
};
for
(
int
i
=
0
;
i
<
9
;
i
+=
3
)
{
unsigned
char
byte1
=
source
[
i
];
unsigned
char
byte2
=
source
[
i
+
1
];
unsigned
char
byte3
=
source
[
i
+
2
];
// ...
}
The next big step is getting the four 6-bit chunks into our buffer
. We
can use our bitwise operators to grab what we need. Look back at
Table 4-3. The leftmost six bits of byte1
make up our first 6-bit chunk.
In this case, we can just shift those six bits to the right two slots:
buffer
[
0
]
=
byte1
>>
2
;
Neat! One down, three to go. The second 6-bit chunk, though, is a little
messy because it uses the two remaining bits from byte1
and four bits from
byte2
. There are several ways to do this, but weâll process the bits
in order and just break up the assignment to the next slot in buffer
into
two steps:
buffer
[
1
]
=
(
byte1
&
0x03
)
<
<
4
;
buffer
[
1
]
|
=
(
byte2
&
0xf0
)
>
>
4
;
First, take the right two bits from
byte1
and scoot them to the left four spaces to make room for the rest of our 6-bit chunk.Now, take the left four bits from
byte2
, scoot them to the right four spaces, and put them intobuffer[1]
without disturbing the upper half of that variable.
Halfway there! We can do something very similar for the third 6-bit chunk:
buffer
[
2
]
=
(
byte2
&
0x0f
)
<<
2
;
buffer
[
2
]
|=
(
byte3
&
0xc0
)
>>
6
;
In this case, we take and scoot the right four bits of byte2
and scoot them
over two slots to make room for the left two bits of byte3
. But like before,
we have to scoot those two bits all the way to the right first.
Our last 6-bit chunk is another easy one. We just want the right six bits of
byte4
, no scooting required:
buffer
[
3
]
=
byte3
&
0x3f
;
Hooray! We have successfully done the 3x8-bit to 4x6-bit conversion! Now we just need to print out each of the values in our buffer
array. Sounds like another loop. And if you recall that we have five ranges for our base 64 âdigits,â that calls for a conditional of some sort. We could list out all 64 cases in a switch
, but that feels tedious. (It would be very self-documenting, at least.) An if/else if
chain should do nicely. Inside any particular branch, weâll do a little character math to get the correct value. As you read this next snippet, see if you can figure out how that character math is working its magic:
for
(
int
b
=
0
;
b
<
4
;
b
++
)
{
if
(
buffer
[
b
]
<
26
)
{
// value 0 - 25, so uppercase letter
printf
(
"%c"
,
'A'
+
buffer
[
b
]);
}
else
if
(
buffer
[
b
]
<
52
)
{
// value 26 - 51, so lowercase letter
printf
(
"%c"
,
'a'
+
(
buffer
[
b
]
-
26
));
}
else
if
(
buffer
[
b
]
<
62
)
{
// value 52 - 61, so a digit
printf
(
"%c"
,
'0'
+
(
buffer
[
b
]
-
52
));
}
else
if
(
buffer
[
b
]
==
62
)
{
// our "+" case, no need for math, just print it
printf
(
"+"
);
}
else
if
(
buffer
[
b
]
==
63
)
{
// our "/" case, no need for math, just print it
printf
(
"/"
);
}
else
{
// Yikes! Error. We should never get here.
printf
(
"
\n\n
Error! Bad 6-bit value: %c
\n
"
,
buffer
[
b
]);
}
}
Does the character math make sense? Since char
is an integer type, you can
âaddâ to characters. If we add one to the character A, we get
B. Add two to A and we get C, etc. For the lowercase letters and the
digits, we first have to realign our buffered value so it is in a range
starting at zero. The last two cases are easy, since we have one value that
maps directly to one character. Hopefully, we never hit our else
clause, but that is exactly what those clauses are for. If we got something wrong, print out a warning!
Whew! Those are some impressive moving parts. And if you want to build tiny devices that communicate with other tiny devices or the cloud, like a tiny security camera sending a picture to your phone, these are exactly the kind of moving parts youâll bump into.
Letâs assemble them in one listing (ch04/encode64.c) with the other bits we need for a valid C program:
#include <stdio.h>
int
main
()
{
// Manually specify a few bytes to encode for now
unsigned
char
source
[
9
]
=
{
0xd8
,
0xff
,
0xe0
,
0xff
,
0x10
,
0x00
,
0x46
,
0x4a
,
0x46
};
char
buffer
[
4
]
=
{
0
,
0
,
0
,
0
};
// sizeof(char) == 1 byte, so the array's size in bytes is also its length
int
source_length
=
sizeof
(
source
);
for
(
int
i
=
0
;
i
<
source_length
;
i
++
)
{
printf
(
"0x%02x "
,
source
[
i
]);
}
printf
(
"==> "
);
for
(
int
i
=
0
;
i
<
source_length
;
i
+=
3
)
{
unsigned
char
byte1
=
source
[
i
];
unsigned
char
byte2
=
source
[
i
+
1
];
unsigned
char
byte3
=
source
[
i
+
2
];
// Now move the appropriate bits into our buffer
buffer
[
0
]
=
byte1
>>
2
;
buffer
[
1
]
=
(
byte1
&
0x03
)
<<
4
;
buffer
[
1
]
|=
(
byte2
&
0xf0
)
>>
4
;
buffer
[
2
]
=
(
byte2
&
0x0f
)
<<
2
;
buffer
[
2
]
|=
(
byte3
&
0xc0
)
>>
6
;
buffer
[
3
]
=
byte3
&
0x3f
;
for
(
int
b
=
0
;
b
<
4
;
b
++
)
{
if
(
buffer
[
b
]
<
26
)
{
// value 0 - 25, so uppercase letter
printf
(
"%c"
,
'A'
+
buffer
[
b
]);
}
else
if
(
buffer
[
b
]
<
52
)
{
// value 26 - 51, so lowercase letter
printf
(
"%c"
,
'a'
+
(
buffer
[
b
]
-
26
));
}
else
if
(
buffer
[
b
]
<
62
)
{
// value 52 - 61, so a digit
printf
(
"%c"
,
'0'
+
(
buffer
[
b
]
-
52
));
}
else
if
(
buffer
[
b
]
==
62
)
{
// our "+" case, no need for math, just print it
printf
(
"+"
);
}
else
if
(
buffer
[
b
]
==
63
)
{
// our "/" case, no need for math, just print it
printf
(
"/"
);
}
else
{
// Yikes! Error. We should never get here.
printf
(
"
\n\n
Error! Bad 6-bit value: %c
\n
"
,
buffer
[
b
]);
}
}
}
printf
(
"
\n
"
);
}
As always, I encourage you to type in the program yourself, making any adjustments you want or adding any comments to help you remember what you learned. You can also compile the encode64.c file and then run it. Hereâs the output:
ch04$ gcc encode64.c ch04$ ./a.out 0xd8 0xff 0xe0 0xff 0x10 0x00 0x46 0x4a 0x46 ==> 2P/g/xAARkpG
Very, very cool. Congratulations, by the way! That is a nontrivial bit of code there. You should be proud. But if you want to really test your skills, try writing your own decoder to reverse this process. If you start with the output above, do you get the original nine bytes? (You can check your answer against mine: ch04/decode64.c.)
Conversion Answers
Whether or not you tackle decoding the base64 encoded string, hopefully you tried converting the values in Table 4-2 yourself. You can compare your answers here. Or use the rosetta.c program!
Decimal | Binary | Octal | Hexadecimal |
---|---|---|---|
14 |
0000 1110 |
016 |
0E |
32 |
0010 0000 |
040 |
20 |
17 |
0001 0001 |
021 |
11 |
50 |
0011 0010 |
062 |
32 |
42 |
0010 1010 |
052 |
2A |
35 |
0001 0011 |
023 |
13 |
167 |
1010 0111 |
247 |
A7 |
249 |
1111 1001 |
371 |
F9 |
Next Steps
Câs support of simple arrays opens up a wide world of storage and
retrieval options for just about any type of data. You do have to pay attention to the number of elements that you expect to use, but within those bounds, Câs arrays are quite efficient. And if you are only storing small, yes or no, on or off type values, C has several operators that make it possible to squeeze those values into the individual bits of a larger data type like an int
. Modern desktops rarely require that much attention to detail, but some of our Arduino options in the latter half of this book care very much!
So whatâs next? Well, our programs are getting interesting enough that weâll want to start breaking the logic up into manageable slices. Think about this book, for example. It is not made up of one, excessive run-on sentence. It is broken into chapters. Those chapters, in turn, are broken into sections. Those sections are broken into paragraphs. It is usually easier to discuss a single paragraph than it is an entire book. C allows you to perform this type of breakdown for your own logic. And once you have the logic in digestible blocks, you can use those blocks just like we have been doing with the printf()
and scanf()
functions. Letâs dive in!
1 Exactly how things go wrong may vary. Your operating system or version, compiler version, or even the conditions on your system at runtime can all affect the output. The point is to be careful not to overflow your arrays.
2 The gcc
stack-protector
option can be used to detect some buffer overflows and abort the program before the overflow can be used maliciously. This is a compile-time flag that is off by default.
3 As an example of an alternative pair of extra characters, the base64url variation uses a minus (â-â) and underscore (â_â).
Get Smaller C now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.