Chapter 4. Bits and (Many) Bytes

Before we start building more complex programs with things like functions in Chapter 5, we should cover two more useful storage categories in C: arrays and individual bits. These aren’t really distinct types like int or double, but they are useful when dealing with tiny things or with lots of things. Indeed, the notion of an array, a sequential list of items, is so useful we had to cheat back in “Getting User Input” and use it without much explanation to store user input in the form of a string.

We have also discussed the idea of Boolean values that are either yes or no, true or false, 1 or 0. When dealing with microcontrollers in particular, you will regularly have a small collection of sensors or switches that are providing on/off values. C’s normal storage options would mean devoting an entire char (8 bits) or int (16 bits) to keeping track of such tiny values. That feels like a bit (ha!) of a waste, and it is. C has a few tricks you can employ to store this type of information more efficiently. In this chapter, we’ll tackle both the big stuff by declaring arrays and then accessing and manipulating their contents, as well as how to work with the smallest bits (ahem). (And I promise not to make more bit puns. Mostly.)

Storing Multiple Things with Arrays

It is almost impossible to find a C program tackling real-world problems that does not use arrays. If you have to work with any collection of values of any type at all, those values will almost certainly wind up in an array. A list of grades, a list of students, the list of US state abbreviations, etc., etc., etc. Even our tiny machines can use arrays to track the colors on a strip of LEDs. It is not an exaggeration to say arrays are ubiquitous in C, so let’s take a closer look at how to use them.

Creating and Manipulating Arrays

As I mentioned, we used an array back in Chapter 2 (in “Getting User Input”) to allow for some user input. Let’s revisit that code (ch04/hello2.c) and pay more attention to the array of characters:

#include <stdio.h>

int main() {
  char name[20];

  printf("Enter your name: ");
  scanf("%s", name);
  printf("Well hello, %s!\n", name);
}

So what exactly does that char name[20] declaration do? It creates a variable named “name” with a base type of char, but it is an array, so you get space to store multiple chars. In this case, we asked for 20 bytes, as illustrated in Figure 4-1.

smac 0401
Figure 4-1. An empty array of type char called name

And what happens with this array variable when we run the program? When you type in a name and hit Return on your keyboard, the characters you typed get placed in the array. Since we used scanf() and its string (%s) format field, we will automatically get a trailing null character ('\0' or sometimes '\000') that marks the end of the string. In memory, the name variable now looks like Figure 4-2.

smac 0402
Figure 4-2. A char array with a string
Note

The null character at the end of the array is a peculiarity of strings; it is not how other types of arrays are managed. Strings are often stored in arrays that are set up before the length of the string is known, and use this '\0' sentinel much like we did in “The while Statement” to mark the end of useful input. All string-processing functions in C expect to see this terminating character, and you can count on its existence in your own work with strings.

Now when we use the name variable again in the subsequent printf() call, we can echo back all of the letters that were stored and the null character tells printf() when to stop, even if the name doesn’t occupy the entire array. Conversely, printing a string that does not have the terminating character will cause printf() to keep going after the end of the array and likely cause a crash.

Length versus capacity

Didn’t we allocate 20 character slots? What are they doing if our name (such as “Grace”) doesn’t occupy all of the slots? Happily, that final, null character solves this quandary rather neatly. We do indeed have room for longer names like “Alexander” or even “Grace Hopper”; the null character always marks the end, no matter how big the array is.

Warning

If you haven’t worked with characters before in C or another language, the notion of a null character can be confusing. It is the character with the numeric value of 0 (zero). That is not the same thing as a space character (ASCII 32) or the digit 0 (ASCII 48) or a newline ('\n' ASCII 10). You usually don’t have to worry about adding or placing these nulls by hand, but it is important to remember they occur at the end of strings, even though they are never printed.

But what if the name was too long for the allocated array? Let’s find out! Run the program again and type in a longer name:

ch04$ ./a.out
Enter your name: @AdmiralGraceMurrayHopper
Well hello, @AdmiralGraceMurrayHopper!
*** stack smashing detected ***: terminated
Aborted (core dumped)

Interesting. So the capacity we declared is a fairly hard limit—things go wrong if we overflow an array.1 Good to know! We always need to reserve sufficent space before we use it.2

What if we didn’t know ahead of time how many slots were in an array? The C sizeof operator can help. It can tell you (in bytes) the size of variables or types. For simple types, that is the length of an int or char or double. For arrays, it is the total memory allocated. That means we can tell how many slots we have in an array as long as we know its base type. Let’s try making an array of double values, say, for an accounting ledger. We’ll pretend we don’t know how many values we can store and use sizeof to find out. Take a look at ch04/capacity.c:

#include <stdio.h>

int main() {
  double ledger[100];
  printf("Size of a double: %li\n", sizeof (double));
  printf("Size of ledger: %li\n", sizeof ledger);
  printf("Calculated ledger capacity: %li\n", sizeof ledger / (sizeof (double)));
}

Notice that when asking about the size of a type, you need parentheses. The compiler needs this extra bit of context to treat the keyword as an expression. For variables like ledger that already fit the expression definition, we can leave them off. Let’s run our tiny program. Here’s the output:

ch04$ gcc capacity.c
ch04$ ./a.out
Size of a double: 8
Size of ledger: 800
Calculated ledger capacity: 100

Nice. Since we actually do know how big we made our array, we can just compare that chosen size to our calculated results. They match. (Whew!) But there are situations where you are given information from an independent source and won’t always know the size of the array. Remember that tools like sizeof exist and can help you understand that information.

Initializing arrays

So far, we’ve created empty arrays or loaded char arrays with input from the user at runtime. Just like simpler variable types, C allows you to initialize arrays when you define them.

For any array, you can supply a list of values inside a pair of curly braces, separated by commas. Here are a few examples:

int days_in_month[12] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
char vowels[6] = { 'a', 'e', 'i', 'o', 'u', 'y' };
float readings[7] = { 8.9, 8.6, 8.5, 8.7, 8.9, 8.8, 8.5 };

Notice that the declared size of the array matches the number of values supplied to initialize the array. In this situation, C allows a nice shorthand: you can omit the explicit size in between the square brackets. The compiler will allocate the correct amount of memory to fit the initialization list exactly. This means we could rewrite our previous snippet like this:

int days_in_month[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
char vowels[] = { 'a', 'e', 'i', 'o', 'u', 'y' };
float readings[] = { 8.9, 8.6, 8.5, 8.7, 8.9, 8.8, 8.5 };

Strings, however, are a special case. C supports the notion of string literals. This means you can use a sequence of characters between double quotes as a value. You can use a string literal to initialize a char[] variable. You can also use it almost anywhere a string variable would be allowed. (We saw this in “The Ternary Operator and Conditional Assignment” where we used the terneray operator (?:) to print true and false values as words instead of as 1 or 0.)

// Special initialization of a char array with a string literal
char secret[] = "password1";

// The printf() format string is usually a string literal
printf("Hello, world!\n");

// And we can print literals, too
printf("The value stored in %s is '%s'\n", "secret", secret);

You can also initialize a string by supplying individual characters inside curly braces, but that is generally harder to read. You have to remember to include the terminating null character, and this verbose option doesn’t provide any other real advantage over the use of a string literal.

Accessing array elements

Once you have an array created, you can access individual elements inside the array using square brackets. You give an index number inside the square brackets, where the first element has an index value of 0. To print the second vowel or the days in July from our earlier arrays, for example:

  printf("The second vowel is: %c\n", vowels[1]);
  printf("July has %d days.\n", days_in_month[6]);

These statements would produce the following output if bundled into a complete program:

The second vowel is: e
July has 31 days.

But the value we supply inside the square brackets does not need to be a fixed number. It can be any expression that results in an integer. (If you have enough memory, it could be a long or other, larger integer type.) This means you can use a calculation or a variable as your index. For example, if we store the “current month” in a variable and use the typical values for months—January is 1, February is 2, and so on—then we could print the number of days in July using the following code:

  int month = 7;
  printf("July (month %d) has %d days.", month, days_in_month[month - 1]);

The ease and flexibility of accessing these members is part of what makes arrays so popular. After a bit of practice, you’ll find them indispensible!

Warning

The value inside the square brackets needs to be “in bounds” or you’ll get a an error at runtime. For example if you tried printing the days in the 15th month like we tried for July, you’d see something like “Invalid (month 15) has -1574633234 days.” C won’t stop you—note we did not cause a crash—but neither did we get a usable value. And assigning values (which we discuss next) to invalid slots in an array is how you cause a buffer overflow. This classic security exploit gets its name from the notion of an array as a storage buffer. You “overflow” it exactly by assigning values to the array outside the actual array. If you get lucky (or are very devious), you can write executable code and trick the computer into running your commands instead of the intended program.

Changing array elements

You can also change the value of a given array position using the square bracket notation. For example, we could alter the number of days in February to accommodate a leap year:

if (year % 4 == 0) {
  // Forgive the naive leap year calculation :)
  days_in_month[1] = 29;
}

This type of post-declaration assignment is handy (or often even necessary) when you have more dynamic data. With the Arduino projects we’ll cover later, for example, you might want to keep the 10 most recent sensor readings. You won’t have those readings when you declare your array. So you can set aside 10 slots, and just fill them in later:

float readings[10];
// ... interesting stuff goes here to set up the sensor and read it
readings[7] = latest_reading;

Just make sure you supply a value of the same type as (or at least compatible with) the array. Our readings array, for example, is expecting floating point numbers. If we were to assign a character to one of the slots, it would “fit” in that slot, but it would produce a strange answer. Assigning the letter x to readings[8] would end up putting the ASCII value of lowercase x (120) in the slot as a float value of 120.0.

Iterating through arrays

The ability to use a variable as an index makes working with an entire array a simple loop task. We could print out all the days_in_month counts using a for loop, for example:

for (int m = 0; m < 12; m++) {
  // remember the array starts at 0, but humans start at 1
  printf("Days in month %d is %d.\n", m + 1, days_in_month[m]);
}

This snippet produces the following output. We can get a sense of just how powerful the combination of arrays and loops could be. With just a tiny bit of code, we get some fairly interesting output:

Days in month 1 is 31.
Days in month 2 is 28.
Days in month 3 is 31.
Days in month 4 is 30.
Days in month 5 is 31.
Days in month 6 is 30.
Days in month 7 is 31.
Days in month 8 is 31.
Days in month 9 is 30.
Days in month 10 is 31.
Days in month 11 is 30.
Days in month 12 is 31.

You’re free to use the elements of your array however you need to. You aren’t limited to printing them out. As another example, we could calculate the average reading from our readings array like so:

float readings[] = { 8.9, 8.6, 8.5, 8.7, 8.9, 8.8, 8.5 };

// Use our sizeof trick to get the number of elements
int count = sizeof readings / sizeof (float);
float total = 0.0;
float average;
for (int r = 0; r < count; r++) {
  total += readings[r];
}
average = total / count;
printf("The average reading is %0.2f\n", average);

This example highlights just how much C you have learned in only a few chapters! If you want some more practice, build this snippet into a complete program. Compile and run it to make sure you have it working. (The average should be 8.70, by the way.) Then add some more variables to capture the highest and lowest readings. You’ll need some if statements to help there. You can see one possible solution in arrays.c in the examples for this chapter.

Review of Strings

I have noted that strings are really just arrays of type char with some extra features supported by the language itself, such as literals. But since strings represent the easiest way to communicate with users, I want to highlight more of what you can do with strings in C.

Initializing strings

We have already seen how to declare and initialize a string. If you know the value of the string ahead of time, you can use a literal. If you don’t know the value, you can still declare the variable and then use scanf() to ask the user what text to store. But what if you wanted to do both? Assign an initial default and then let the user supply an optional new value that overrides the default?

Happily, you can get there, but you do have to plan ahead a little. It might be tempting to use the default value when you first declare your variable, and then let the user provide a different value at runtime if they want. This works, but it requires an extra question to the user (“Do you want to change the background color, yes or no?”) and also assumes the user will supply a valid value as an alternative. Such assumptions are often safe as you are likely the only user while you’re learning a new language. But in programs you share with others, it’s better not to assume what the user will do.

String literals also make it tempting to think you can simply overwrite an existing string just like you can with int or float variables. But a string really is just a char[], and arrays are not assignable beyond the optional initialization when you declare them.

These limitations can all be overcome with the use of things like functions, which we’ll explore in Chapter 5. In fact, the need for the functions that make it possible to manipulate strings at runtime are so useful, they have been bundled up into their own library, which I cover in “stdlib.h”.

For now, I want you to remember that string literals can make the initialization of character arrays simple and readable, but that at their heart, strings in C are not like numbers and individual characters.

Accessing individual characters

But I do want to reiterate that strings are just arrays. You can access individual characters in your string using the same syntax you use to access the members of any other array. For example, we could find out if a given phrase contains a comma by looking at each character in the phrase. Here’s ch04/comma.c:

#include <stdio.h>

int main() {
  char phrase[] = "Hello, world!";
  int i = 0;
  // keep looping until the end of the string
  while (phrase[i] != '\0') {
    if (phrase[i] == ',') {
      printf("Found a comma at position %d.\n", i);
      break;
    }
    // try the next character
    i++;
  }
  if (phrase[i] == '\0') {
    // Rats. Made it to the end of the string without a match.
    printf("No comma found in %s\n", phrase);
  }
}

This program actually uses the array nature of the string a few times. Our loop condition depends on accessing a single character of the string just like the if condition that helps answer our original question. And we test an individual character at the very end to see if we found something or not. We’ll look at several string-related functions in Chapter 7, but hopefully you see how you could accomplish things like copying or comparing strings using a good loop and the square brackets to march one character at a time through your array.

Multidimensional Arrays

It may not be obvious since strings are already an array, but you can store an array of strings in C. But because there is no “string type” that you can use when declaring such an array, how do you do it? Turns out C supports the idea of a multidimensional array so you can create an array of char[] just like other arrays:

char month_names[][];

Seems fair. But what is not obvious in that declaration is what the pair of square bracket pairs refer to. When declaring a two-dimensional array like this, the first square bracket pair can be thought of as the row index, and the second is the column. Another way to think about it is the first index tells you how many character arrays we’ll be storing and the second index tells you how long each of those arrays can be.

We know how many months there are and a little research tells us the longest name is September, with nine letters. Add on one more for our terminating null character, and we could precisely define our month_names array like this:

char month_names[12][11];

You could also initialize this two-dimensional array since we know the names of the months and don’t require user input:

char month_names[12][11] = {
  "January", "February", "March", "April", "May", "June", "July",
  "August", "September", "October", "November", "December"
};

But here I cheated a little with the initialization by using string literals, so the second dimension of the month_names array isn’t readily apparent. The first dimension is the months, and the second (hidden) dimension is the individual characters that make up the month names. If you are working with other data types that don’t have this string literal shortcut, you can use nested curly brace lists like this:

int multiplication[5][5] = {
  { 0, 0, 0,  0,  0 },
  { 0, 1, 2,  3,  4 },
  { 0, 2, 4,  6,  8 },
  { 0, 3, 6,  9, 12 },
  { 0, 4, 8, 12, 16 }
};

It might be tempting to assume the compiler can determine the size of the multi-dimensional structure, but sadly, you must supply the capacity for each dimension beyond the first. For our month names, for example, we could start off without the “12” for how many names, but not without the “11” indicating the maximum length of any individual name:

// This shortcut is ok
char month_names[][11] = { "January", "February" /* ... */ };

// This shortcut is NOT
char month_names[][] = { "January", "February" /* ... */ };

You’ll eventually internalize these rules, but the compiler (and many editors) will always be there to catch you if you make a small mistake.

Accessing Elements in Multidimensional Arrays

With our array of month names, it is straightforward getting access to any particular month. It looks just like accessing the element of any other one-dimensional array:

printf("The name of the first month is: %s\n", month_names[0]);

// Output: The name of the first month is: January

But how would we access an element in the multiplication two-dimensional array? We use two indices:

printf("Looking up 3 x 4: %d\n", multiplication[3][4]);

// Output: Looking up 3 x 4: 12

Notice that in this multiplication table, the potentially strange use of zero as the first index value turns out to be a useful element. Index “0” gives us a row—or column—of valid multiplication answers.

And with two indices, you’ll need two loops if you want to print out all of the data. We can take the work we did in “Nested Loops and Tables” and use it to access our stored values rather than generating the numbers directly. Here’s the printing snippet from ch04/print2d.c:

  for (int row = 0; row < 5; row++) {
    for (int col = 0; col < 5; col++) {
      printf("%3d", multiplication[row][col]);
    }
    printf("\n");
  }

And here is our nicely formatted table:

ch04$ gcc print2d.c
ch04$ ./a.out
  0  0  0  0  0
  0  1  2  3  4
  0  2  4  6  8
  0  3  6  9 12
  0  4  8 12 16

We’ll see some other options in Chapter 6 for more tailored multidimensional storage. In the near term, just remember that you can create more dimensions with more pairs of square brackets. While you’ll likely use one-dimensional arrays most of the time, tables are common enough and spatial data often fits in three-dimensional “cubes.” Few programmers will ever need it, especially those of us concentrating on microcontrollers, but C does support higher orders of arrays.

Storing Bits

Arrays allow us to store truly vast quantities of data with relative ease. At the other end of the spectrum, C has several operators that you can use to manipulate very small amounts of data. Indeed, you can work with the absolute smallest pieces of data: individual bits.

When C was developed in the 1970s, every byte of memory was expensive, and therefore precious. As I noted at the beginning of the chapter, if you had a particular variable that stored Boolean answers, using 16 bits for an int or even just 8 bits for a char would be a little wasteful. If you had an array of such variables, it could become very wasteful. Desktop computers these days can manage that type of waste without blinking an eye (or an LED), but our microcontrollers often need all the storage help they can get.

Binary, Octal, Hexadecimal

Before we tackle the operators in C that access and manipulate bits, let’s review some notation for discussing binary values. If we have a single bit, a 0 or a 1 are sufficient and that’s easy enough. However, if we want to store a dozen bits inside one int variable, we need a way to describe the value of that int. Technically, the int will have a decimal (base 10) representation, but base 10 does not map cleanly to individual bits. For that, octal and hexadecimal notation is much clearer. (Binary, or base 2, notation would obviously be clearest, but large numbers get very long in binary. Octal and hexadecimal—often just “hex”—are a good compromise.)

When we talk about numbers, we often implicitly use base 10, thanks to the digits (ooh, get it?) on our hands. Computers don’t have hands (discounting robots, of course) and don’t count in base 10. They use binary. Two digits, 0 and 1, make up the entirety of their world. If you group three binary digits, you can represent the decimal numbers 0 through 7, which is eight total numbers, so this is base 8, or octal. Add a fourth bit and you can represent 0 through 15, which covers the individual “digits” in hexadecimal. Table 4-1 shows these first 16 values in all four bases.

Table 4-1. Numbers in decimal, binary, octal, and hexadecimal
Decimal Binary Octal Hexadecimal Decimal Binary Octal Hexadecimal

 0

0000 0000

000

0x00

 8

0000 1000

010

0x08

 1

0000 0001

001

0x01

 9

0000 1001

011

0x09

 2

0000 0010

002

0x02

10

0000 1010

012

0x0A / 0x0a

 3

0000 0011

003

0x03

11

0000 1011

013

0x0B / 0x0b

 4

0000 0100

004

0x04

12

0000 1100

014

0x0C / 0x0c

 5

0000 0101

005

0x05

13

0000 1101

015

0x0D / 0x0d

 6

0000 0110

006

0x06

14

0000 1110

016

0x0E / 0x0e

 7

0000 0111

007

0x07

15

0000 1111

017

0x0F / 0x0f

You might notice that I always showed eight numbers for the binary column, three for octal, and two for hex. The byte (8 bits) is a very common unit to work with in C. Binary numbers often get shown in groups of four, with as many groups as required to cover the largest number being discussed. So for a full byte of 8 bits, which can store any value between 0 to 255, for example, you would see a binary value with two groupings of four digits. Similarly, octal values with three digits can display any byte’s value, and hexadecimal numbers need two digits. Note also that hexadecimal literals are not case sensitive. (Neither is the “x” in the hexadecimal prefix, but an uppercase “X” can be harder to distinguish.)

We’ll be using binary notation from time to time when working with microcontrollers in the latter half of this book, but you may have already run into hexadecimal numbers if you have written any styled text in HTML or CSS or similar markup languages. Colors in these documents are often represented with the hex values for a byte of red, a byte of green, a byte of blue, and occasionally a byte of alpha (transparency). So a full red that ignores the alpha channel would be FF0000. Now that you know two hex digits can represent one byte, it may be easier to read such color values.

To help you get accustomed to these different bases, try filling out the missing values in Table 4-2. (You can check your answers with the Table 4-4 table at the end of the chapter.) The numbers are not in any particular order, by the way. I want to keep you on your toes!

Table 4-2. Converting between bases
Decimal Binary Octal Hexadecimal

14

016

0010 0000

021

11

50

32

052

13

167

1111 1001

Modern browsers can convert bases for you right in the search bar, so you probably won’t need to memorize the full 256 values possible in a byte. But it will still be useful if you can estimate the size of a hex value or determine if an octal ASCII code is probably a letter or a number.

Octal and Hexadecimal Literals in C

The C language has special options for expressing numeric literals in octal and hex. Octal literals start with a simple 0 as a prefix, although you can have multiple zeroes if you are keeping all of your values the same width, like we did in our base tables. For hex values, you use the prefix 0x or 0X. You typically match the case of the ‘X’ character to the case of any of the A-F digits in your hex value, but this is just a convention.

Here’s a snippet showing how to use some of these prefixes:

int line_feed = 012;
int carriage_return = 015;
int red = 0xff;
int blue = 0x7f;

Some compilers support nonstandard prefixes or suffixes for representing binary literals, but as the “nonstandard” qualifier suggests, they are not part of the official C language.

Input and Output of Octal and Hex Values

The printf() function has built-in format specifiers to help you produce octal or hexadecimal output. Octal value can be printed with the %o specifier and hex can be shown with either %x or %X, depending on whether you want lower- or uppercase output. These specifiers can be used with variables or expressions of any of the integer types in any base, which makes printf() a pretty easy way to convert from decimal to octal or hex. We could easily produce a table similar to Table 4-1 (minus the binary column) using a loop and a single printf(). We can take advantage of the width and padding options of the format specifier to get our desired three octal digits and two hex digits. Take a look at ch04/dec_oct_hex.c:

#include <stdio.h>

int main() {
  printf(" Dec  Oct  Hex\n");
  for (int i = 0; i < 16; i++) {
    printf(" %3d  %03o  0x%02X\n", i, i, i);
  }
}

Notice that we just reuse the exact same variable for each of the three columns. Also notice that when printing the hexadecimal version, I manually added the “0x” prefix—it is not included in the %x or %X formats. Here are a few of the first and last lines:

ch04$ gcc dec_oct_hex.c
ch04$ ./a.out
 Dec  Oct  Hex
   0  000  0x00
   1  001  0x01
   2  002  0x02
   3  003  0x03
 ...
  13  015  0x0D
  14  016  0x0E
  15  017  0x0F

Neat. Just the output we wanted. On the input side using scanf(), the format specifiers work in an interesting way. They are all still used to get numeric input from the user. The different specifiers now perform base conversion on the number you enter. If you specify decimal input (%d), you cannot use hex values. Conversely, if you specify hex input (%x or %X) and only enter numbers (i.e., you don’t use any of the A-F digits), the number will still be converted from base 16.

Note

The specifiers %d and %i are normally interchangeable. In a printf() call, they will result in identical output. In a scanf() call, however, the %d option requires you to enter a simple base 10 number. The %i specifier allows you to use the various C literal perfixes to enter a value in a different base such as 0x to enter a hexadecimal number.

We can illustrate this with a simple converter program, ch04/rosetta.c, that will translate different inputs to all three bases on output. We can set which type of input we expect in the program but use an if/else if/else block to make it easy to adjust. (Although recompiling will still be required.)

#include <stdio.h>

int main() {
  char base;
  int input;

  printf("Convert from? (d)ecimal, (o)ctal, he(x): ");
  scanf("%c", &base);

  if (base == 'o') {
    // Get octal input
    printf("Please enter a number in octal: ");
    scanf("%o", &input);
  } else if (base == 'x') {
    // Get hex input
    printf("Please enter a number in hexadecimal: ");
    scanf("%x", &input);
  } else {
    // assume decimal input
    printf("Please enter a number in decimal: ");
    scanf("%d", &input);
  }
  printf("Dec: %d,  Oct: %o,  Hex: %x\n", input, input, input);
}

Here are a few example runs:

ch04$ gcc rosetta.c

ch04$ ./a.out
Convert from? (d)ecimal, (o)ctal, he(x): d
Please enter a number in decimal: 55
Dec: 55,  Oct: 67,  Hex: 37

ch04$ ./a.out
Convert from? (d)ecimal, (o)ctal, he(x): x
Please enter a number in hexadecimal: 37
Dec: 55,  Oct: 67,  Hex: 37

ch04$ ./a.out
Convert from? (d)ecimal, (o)ctal, he(x): d
Please enter a number in decimal: 0x37
Dec: 0,  Oct: 0,  Hex: 0

Interesting. The first two runs went according to plan. The third run didn’t create an error but didn’t really work, either. What happened here is a sort of “feature” of scanf(). It tried very hard to bring in a decimal number. It found the character 0 in our input, which is a valid decimal digit, so it started parsing that character. But it next encountered the x character which is not valid for a base 10 number. So that was the end of the parsing and our program converted the value 0 into each of the three bases.

Try running this program yourself and switch the mode a few times. Do you get the behavior you expect? Can you cause any errors?

Knowing what we do about the difference between %i and other numeric specifiers in scanf(), can you see how to make this program a little simpler? It should be possible to accept any of the three bases for input without the big if statement. I’ll leave this problem to you as an exercise, but you can see one possible solution in the rosetta2.c file in the code examples for this chapter.

Bitwise Operators

Starting out on limited hardware like C did means occasionally working with data at the bit level quite apart from printing or reading in binary data. C supports this work with bitwise operators. These operators allow you to tweak individual bits inside int variables (or char or long, of course). We’ll see some fun uses of these features with the Arduino microcontroller in Chapter 10.

Table 4-3 describes these operators and shows some examples that make use of the following two variables:

char a = 0xD; // 1101 in binary
char b = 0x7; // 0111 in binary
Table 4-3. Bitwise operators in C
Operator Name Description Example

&

bitwise and

Both bits must be 1 to yield a 1

a & b == 0101

|

bitwise or

Either bit can be 1 to yield a 1

a | b == 1111

!

bitwise not

Yields the opposite of the input bit

~a == 0010

^

bitwise xor

eXclusive OR, bits that don’t match yield a 1

a ^ b == 1010

<<

left shift

Move bits to the left by a number of places

a << 3 == 0110 1000

>>

right shift

Move bits to the right by a number of places

b >> 2 == 0001

You can technically apply bitwise operators to any variable type to tweak particular bits. They are rarely used on floating point types, though. You usually pick an integral type that is big enough to hold however many individual bits you need. Because they are “editing” the bits of a given variable, you often see them used with compound assignment operators (op=). If you have five LEDs, for example, you could keep track of their on/off state with a single char type variable, as in this snippet:

char leds = 0;  // Start with everyone off, 0000 0000

leds |= 8;    // Turn on the 4th led from the right, 0000 1000
leds ^= 0x1f; // Toggle all lights, 0001 0111
leds &= 0x0f; // Turn off 5th led, leave others as is, 0000 0111

Five int or char values likely won’t make the difference in whether you can store or run a program on a microcontroller, even ones with only one or two kilobytes of memory, but those small storage needs do add up. If you’re tracking a panel of LEDs with hundreds or thousands of lights, it makes a difference how tightly you can store their state. One size rarely fits all, so remember your options and pick one that balances between ease of use and any resource constraints you have.

Mixing Bits and Bytes

We now have enough elements of C under our belts to start writing some really interesting code. We can combine all of our previous discussions on bits, arrays, types, looping, and branching to tackle a popular way of encoding binary data in text. One format for transmitting binary data through networks of devices with potentially limited resources is to convert it to simple lines of text. This is known as “base64” encoding and is still used in things like inline email attachments for images. The 64 comes from the fact that this encoding uses 6-bit chunks, and 2 to the 6th power is 64. We use numbers, lowercase letters, uppercase letters, and other characters more or less arbitrarily chosen, typically the plus (+) and the forward slash (/).3

For this encoding, values 0 through 25 are the uppercase letters A through Z. Values 26 through 51 are the lowercase letters a through z. Values 52 through 61 are the digits 0 through 9, and finally, value 62 is the plus sign, and 63 is the forward slash.

But aren’t bytes 8 bits long? Yes, they are. That’s exactly where all of our recent topics come into play! We can use this new knowledge to change those 8-bit chunks into 6-bit chunks.

Figure 4-3 shows a small example of converting three bytes into a string of base64 text. These happen to be the first few bytes of a valid JPEG file, but you could work on any source you like. This is a fairly trivial bit of binary data, of course, but it will validate our algorithm.

smac 0403
Figure 4-3. Going from 8-bit to 6-bit chunks with encoding

We have nine bytes total to encode in our example, but really we just want to take things three bytes at a time, like the illustration, and repeat. Sounds like a job for a loop! We could use any of our loops, but we’ll go with a for loop since we know where to start and end, and we can count up by threes. We’ll pull out three bytes from the source array into three variables, just for convenience of discussion.

unsigned char source[9] = { 0xd8,0xff,0xe0,0xff,0x10,0x00,0x46,0x4a,0x46 };
char buffer[4] = { 0, 0, 0, 0 };

for (int i = 0; i < 9; i += 3) {
  unsigned char byte1 = source[i];
  unsigned char byte2 = source[i + 1];
  unsigned char byte3 = source[i + 2];
  // ...
}

The next big step is getting the four 6-bit chunks into our buffer. We can use our bitwise operators to grab what we need. Look back at Table 4-3. The leftmost six bits of byte1 make up our first 6-bit chunk. In this case, we can just shift those six bits to the right two slots:

  buffer[0] = byte1 >> 2;

Neat! One down, three to go. The second 6-bit chunk, though, is a little messy because it uses the two remaining bits from byte1 and four bits from byte2. There are several ways to do this, but we’ll process the bits in order and just break up the assignment to the next slot in buffer into two steps:

  buffer[1] = (byte1 & 0x03) << 4;   1
  buffer[1] |= (byte2 & 0xf0) >> 4;  2
1

First, take the right two bits from byte1 and scoot them to the left four spaces to make room for the rest of our 6-bit chunk.

2

Now, take the left four bits from byte2, scoot them to the right four spaces, and put them into buffer[1] without disturbing the upper half of that variable.

Halfway there! We can do something very similar for the third 6-bit chunk:

  buffer[2] = (byte2 & 0x0f) << 2;
  buffer[2] |= (byte3 & 0xc0) >> 6;

In this case, we take and scoot the right four bits of byte2 and scoot them over two slots to make room for the left two bits of byte3. But like before, we have to scoot those two bits all the way to the right first. Our last 6-bit chunk is another easy one. We just want the right six bits of byte4, no scooting required:

  buffer[3] = byte3 & 0x3f;

Hooray! We have successfully done the 3x8-bit to 4x6-bit conversion! Now we just need to print out each of the values in our buffer array. Sounds like another loop. And if you recall that we have five ranges for our base 64 “digits,” that calls for a conditional of some sort. We could list out all 64 cases in a switch, but that feels tedious. (It would be very self-documenting, at least.) An if/else if chain should do nicely. Inside any particular branch, we’ll do a little character math to get the correct value. As you read this next snippet, see if you can figure out how that character math is working its magic:

  for (int b = 0; b < 4; b++) {
    if (buffer[b] < 26) {
      // value 0 - 25, so uppercase letter
      printf("%c", 'A' + buffer[b]);
    } else if (buffer[b] < 52) {
      // value 26 - 51, so lowercase letter
      printf("%c", 'a' + (buffer[b] - 26));
    } else if (buffer[b] < 62) {
      // value 52 - 61, so a digit
      printf("%c", '0' + (buffer[b] - 52));
    } else if (buffer[b] == 62) {
      // our "+" case, no need for math, just print it
      printf("+");
    } else if (buffer[b] == 63) {
      // our "/" case, no need for math, just print it
      printf("/");
    } else {
      // Yikes! Error. We should never get here.
      printf("\n\n Error! Bad 6-bit value: %c\n", buffer[b]);
    }
  }

Does the character math make sense? Since char is an integer type, you can “add” to characters. If we add one to the character A, we get B. Add two to A and we get C, etc. For the lowercase letters and the digits, we first have to realign our buffered value so it is in a range starting at zero. The last two cases are easy, since we have one value that maps directly to one character. Hopefully, we never hit our else clause, but that is exactly what those clauses are for. If we got something wrong, print out a warning!

Whew! Those are some impressive moving parts. And if you want to build tiny devices that communicate with other tiny devices or the cloud, like a tiny security camera sending a picture to your phone, these are exactly the kind of moving parts you’ll bump into.

Let’s assemble them in one listing (ch04/encode64.c) with the other bits we need for a valid C program:

#include <stdio.h>

int main() {
  // Manually specify a few bytes to encode for now
  unsigned char source[9] = { 0xd8,0xff,0xe0,0xff,0x10,0x00,0x46,0x4a,0x46 };
  char buffer[4] = { 0, 0, 0, 0 };

  // sizeof(char) == 1 byte, so the array's size in bytes is also its length
  int source_length = sizeof(source);
  for (int i = 0; i < source_length; i++) {
    printf("0x%02x ", source[i]);
  }
  printf("==> ");
  for (int i = 0; i < source_length; i += 3) {
    unsigned char byte1 = source[i];
    unsigned char byte2 = source[i + 1];
    unsigned char byte3 = source[i + 2];

    // Now move the appropriate bits into our buffer
    buffer[0] = byte1 >> 2;
    buffer[1] = (byte1 & 0x03) << 4;
    buffer[1] |= (byte2 & 0xf0) >> 4;
    buffer[2] = (byte2 & 0x0f) << 2;
    buffer[2] |= (byte3 & 0xc0) >> 6;
    buffer[3] = byte3 & 0x3f;

    for (int b = 0; b < 4; b++) {
      if (buffer[b] < 26) {
        // value 0 - 25, so uppercase letter
        printf("%c", 'A' + buffer[b]);
      } else if (buffer[b] < 52) {
        // value 26 - 51, so lowercase letter
        printf("%c", 'a' + (buffer[b] - 26));
      } else if (buffer[b] < 62) {
        // value 52 - 61, so a digit
        printf("%c", '0' + (buffer[b] - 52));
      } else if (buffer[b] == 62) {
        // our "+" case, no need for math, just print it
        printf("+");
      } else if (buffer[b] == 63) {
        // our "/" case, no need for math, just print it
        printf("/");
      } else {
        // Yikes! Error. We should never get here.
        printf("\n\n Error! Bad 6-bit value: %c\n", buffer[b]);
      }
    }
  }
  printf("\n");
}

As always, I encourage you to type in the program yourself, making any adjustments you want or adding any comments to help you remember what you learned. You can also compile the encode64.c file and then run it. Here’s the output:

ch04$ gcc encode64.c
ch04$ ./a.out
0xd8 0xff 0xe0 0xff 0x10 0x00 0x46 0x4a 0x46  ==> 2P/g/xAARkpG

Very, very cool. Congratulations, by the way! That is a nontrivial bit of code there. You should be proud. But if you want to really test your skills, try writing your own decoder to reverse this process. If you start with the output above, do you get the original nine bytes? (You can check your answer against mine: ch04/decode64.c.)

Conversion Answers

Whether or not you tackle decoding the base64 encoded string, hopefully you tried converting the values in Table 4-2 yourself. You can compare your answers here. Or use the rosetta.c program!

Table 4-4. Base conversion answers
Decimal Binary Octal Hexadecimal

14

0000 1110

016

0E

32

0010 0000

040

20

17

0001 0001

021

11

50

0011 0010

062

32

42

0010 1010

052

2A

35

0001 0011

023

13

167

1010 0111

247

A7

249

1111 1001

371

F9

Next Steps

C’s support of simple arrays opens up a wide world of storage and retrieval options for just about any type of data. You do have to pay attention to the number of elements that you expect to use, but within those bounds, C’s arrays are quite efficient. And if you are only storing small, yes or no, on or off type values, C has several operators that make it possible to squeeze those values into the individual bits of a larger data type like an int. Modern desktops rarely require that much attention to detail, but some of our Arduino options in the latter half of this book care very much!

So what’s next? Well, our programs are getting interesting enough that we’ll want to start breaking the logic up into manageable slices. Think about this book, for example. It is not made up of one, excessive run-on sentence. It is broken into chapters. Those chapters, in turn, are broken into sections. Those sections are broken into paragraphs. It is usually easier to discuss a single paragraph than it is an entire book. C allows you to perform this type of breakdown for your own logic. And once you have the logic in digestible blocks, you can use those blocks just like we have been doing with the printf() and scanf() functions. Let’s dive in!

1 Exactly how things go wrong may vary. Your operating system or version, compiler version, or even the conditions on your system at runtime can all affect the output. The point is to be careful not to overflow your arrays.

2 The gcc stack-protector option can be used to detect some buffer overflows and abort the program before the overflow can be used maliciously. This is a compile-time flag that is off by default.

3 As an example of an alternative pair of extra characters, the base64url variation uses a minus (“-”) and underscore (“_”).

Get Smaller C now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.