Chapter 4. Bits and (Many) Bytes

Before we start building more complex programs with things like functions in ChapterÂ 5, we should cover two more useful storage categories in C: arrays and individual bits. These arenât really distinct types like int or double, but they are useful when dealing with tiny things or with lots of things. Indeed, the notion of an array, a sequential list of items, is so useful we had to cheat back in âGetting User Inputâ and use it without much explanation to store user input in the form of a string.

We have also discussed the idea of Boolean values that are either yes or no, true or false, 1 or 0. When dealing with microcontrollers in particular, you will regularly have a small collection of sensors or switches that are providing on/off values. Câs normal storage options would mean devoting an entire char (8 bits) or int (16 bits) to keeping track of such tiny values. That feels like a bit (ha!) of a waste, and it is. C has a few tricks you can employ to store this type of information more efficiently. In this chapter, weâll tackle both the big stuff by declaring arrays and then accessing and manipulating their contents, as well as how to work with the smallest bits (ahem). (And I promise not to make more bit puns. Mostly.)

Storing Multiple Things with Arrays

It is almost impossible to find a C program tackling real-world problems that does not use arrays. If you have to work with any collection of values of any type at all, those values will almost certainly wind up in an array. A list of grades, a list of students, the list of US state abbreviations, etc., etc., etc. Even our tiny machines can use arrays to track the colors on a strip of LEDs. It is not an exaggeration to say arrays are ubiquitous in C, so letâs take a closer look at how to use them.

Creating and Manipulating Arrays

As I mentioned, we used an array back in ChapterÂ 2 (in âGetting User Inputâ) to allow for some user input. Letâs revisit that code (ch04/hello2.c) and pay more attention to the array of characters:

#include <stdio.h>

int main() {
  char name[20];

  printf("Enter your name: ");
  scanf("%s", name);
  printf("Well hello, %s!\n", name);
}

So what exactly does that char name[20] declaration do? It creates a variable named ânameâ with a base type of char, but it is an array, so you get space to store multiple chars. In this case, we asked for 20 bytes, as illustrated in FigureÂ 4-1.

And what happens with this array variable when we run the program? When you type in a name and hit Return on your keyboard, the characters you typed get placed in the array. Since we used scanf() and its string (%s) format field, we will automatically get a trailing null character ('\0' or sometimes '\000') that marks the end of the string. In memory, the name variable now looks like FigureÂ 4-2.

Note

The null character at the end of the array is a peculiarity of strings; it is not how other types of arrays are managed. Strings are often stored in arrays that are set up before the length of the string is known, and use this '\0' sentinel much like we did in âThe while Statementâ to mark the end of useful input. All string-processing functions in C expect to see this terminating character, and you can count on its existence in your own work with strings.

Now when we use the name variable again in the subsequent printf() call, we can echo back all of the letters that were stored and the null character tells printf() when to stop, even if the name doesnât occupy the entire array. Conversely, printing a string that does not have the terminating character will cause printf() to keep going after the end of the array and likely cause a crash.

Length versus capacity

Didnât we allocate 20 character slots? What are they doing if our name (such as âGraceâ) doesnât occupy all of the slots? Happily, that final, null character solves this quandary rather neatly. We do indeed have room for longer names like âAlexanderâ or even âGrace Hopperâ; the null character always marks the end, no matter how big the array is.

Warning

If you havenât worked with characters before in C or another language, the notion of a null character can be confusing. It is the character with the numeric value of 0 (zero). That is not the same thing as a space character (ASCII 32) or the digit 0 (ASCII 48) or a newline ('\n' ASCII 10). You usually donât have to worry about adding or placing these nulls by hand, but it is important to remember they occur at the end of strings, even though they are never printed.

But what if the name was too long for the allocated array? Letâs find out! Run the program again and type in a longer name:

ch04$ ./a.out
Enter your name: @AdmiralGraceMurrayHopper
Well hello, @AdmiralGraceMurrayHopper!
*** stack smashing detected ***: terminated
Aborted (core dumped)

Interesting. So the capacity we declared is a fairly hard limitâthings go wrong if we overflow an array.¹ Good to know! We always need to reserve sufficent space before we use it.²

What if we didnât know ahead of time how many slots were in an array? The C sizeof operator can help. It can tell you (in bytes) the size of variables or types. For simple types, that is the length of an int or char or double. For arrays, it is the total memory allocated. That means we can tell how many slots we have in an array as long as we know its base type. Letâs try making an array of double values, say, for an accounting ledger. Weâll pretend we donât know how many values we can store and use sizeof to find out. Take a look at ch04/capacity.c:

#include <stdio.h>

int main() {
  double ledger[100];
  printf("Size of a double: %li\n", sizeof (double));
  printf("Size of ledger: %li\n", sizeof ledger);
  printf("Calculated ledger capacity: %li\n", sizeof ledger / (sizeof (double)));
}

Notice that when asking about the size of a type, you need parentheses. The compiler needs this extra bit of context to treat the keyword as an expression. For variables like ledger that already fit the expression definition, we can leave them off. Letâs run our tiny program. Hereâs the output:

ch04$ gcc capacity.c
ch04$ ./a.out
Size of a double: 8
Size of ledger: 800
Calculated ledger capacity: 100

Nice. Since we actually do know how big we made our array, we can just compare that chosen size to our calculated results. They match. (Whew!) But there are situations where you are given information from an independent source and wonât always know the size of the array. Remember that tools like sizeof exist and can help you understand that information.

Initializing arrays

So far, weâve created empty arrays or loaded char arrays with input from the user at runtime. Just like simpler variable types, C allows you to initialize arrays when you define them.

For any array, you can supply a list of values inside a pair of curly braces, separated by commas. Here are a few examples:

int days_in_month[12] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
char vowels[6] = { 'a', 'e', 'i', 'o', 'u', 'y' };
float readings[7] = { 8.9, 8.6, 8.5, 8.7, 8.9, 8.8, 8.5 };

Notice that the declared size of the array matches the number of values supplied to initialize the array. In this situation, C allows a nice shorthand: you can omit the explicit size in between the square brackets. The compiler will allocate the correct amount of memory to fit the initialization list exactly. This means we could rewrite our previous snippet like this:

int days_in_month[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
char vowels[] = { 'a', 'e', 'i', 'o', 'u', 'y' };
float readings[] = { 8.9, 8.6, 8.5, 8.7, 8.9, 8.8, 8.5 };

Strings, however, are a special case. C supports the notion of string literals. This means you can use a sequence of characters between double quotes as a value. You can use a string literal to initialize a char[] variable. You can also use it almost anywhere a string variable would be allowed. (We saw this in âThe Ternary Operator and Conditional Assignmentâ where we used the terneray operator (?:) to print true and false values as words instead of as 1 or 0.)

// Special initialization of a char array with a string literal
char secret[] = "password1";

// The printf() format string is usually a string literal
printf("Hello, world!\n");

// And we can print literals, too
printf("The value stored in %s is '%s'\n", "secret", secret);

You can also initialize a string by supplying individual characters inside curly braces, but that is generally harder to read. You have to remember to include the terminating null character, and this verbose option doesnât provide any other real advantage over the use of a string literal.

Accessing array elements

Once you have an array created, you can access individual elements inside the array using square brackets. You give an index number inside the square brackets, where the first element has an index value of 0. To print the second vowel or the days in July from our earlier arrays, for example:

  printf("The second vowel is: %c\n", vowels[1]);
  printf("July has %d days.\n", days_in_month[6]);

These statements would produce the following output if bundled into a complete program:

The second vowel is: e
July has 31 days.

But the value we supply inside the square brackets does not need to be a fixed number. It can be any expression that results in an integer. (If you have enough memory, it could be a long or other, larger integer type.) This means you can use a calculation or a variable as your index. For example, if we store the âcurrent monthâ in a variable and use the typical values for monthsâJanuary is 1, February is 2, and so onâthen we could print the number of days in July using the following code:

  int month = 7;
  printf("July (month %d) has %d days.", month, days_in_month[month - 1]);

The ease and flexibility of accessing these members is part of what makes arrays so popular. After a bit of practice, youâll find them indispensible!

Warning

The value inside the square brackets needs to be âin boundsâ or youâll get a an error at runtime. For example if you tried printing the days in the 15th month like we tried for July, youâd see something like âInvalid (month 15) has -1574633234 days.â C wonât stop youânote we did not cause a crashâbut neither did we get a usable value. And assigning values (which we discuss next) to invalid slots in an array is how you cause a buffer overflow. This classic security exploit gets its name from the notion of an array as a storage buffer. You âoverflowâ it exactly by assigning values to the array outside the actual array. If you get lucky (or are very devious), you can write executable code and trick the computer into running your commands instead of the intended program.

Changing array elements

You can also change the value of a given array position using the square bracket notation. For example, we could alter the number of days in February to accommodate a leap year:

if (year % 4 == 0) {
  // Forgive the naive leap year calculation :)
  days_in_month[1] = 29;
}

This type of post-declaration assignment is handy (or often even necessary) when you have more dynamic data. With the Arduino projects weâll cover later, for example, you might want to keep the 10 most recent sensor readings. You wonât have those readings when you declare your array. So you can set aside 10 slots, and just fill them in later:

float readings[10];
// ... interesting stuff goes here to set up the sensor and read it
readings[7] = latest_reading;

Just make sure you supply a value of the same type as (or at least compatible with) the array. Our readings array, for example, is expecting floating point numbers. If we were to assign a character to one of the slots, it would âfitâ in that slot, but it would produce a strange answer. Assigning the letter x to readings[8] would end up putting the ASCII value of lowercase x (120) in the slot as a float value of 120.0.

Iterating through arrays

The ability to use a variable as an index makes working with an entire array a simple loop task. We could print out all the days_in_month counts using a for loop, for example:

for (int m = 0; m < 12; m++) {
  // remember the array starts at 0, but humans start at 1
  printf("Days in month %d is %d.\n", m + 1, days_in_month[m]);
}

This snippet produces the following output. We can get a sense of just how powerful the combination of arrays and loops could be. With just a tiny bit of code, we get some fairly interesting output:

Days in month 1 is 31.
Days in month 2 is 28.
Days in month 3 is 31.
Days in month 4 is 30.
Days in month 5 is 31.
Days in month 6 is 30.
Days in month 7 is 31.
Days in month 8 is 31.
Days in month 9 is 30.
Days in month 10 is 31.
Days in month 11 is 30.
Days in month 12 is 31.

Youâre free to use the elements of your array however you need to. You arenât limited to printing them out. As another example, we could calculate the average reading from our readings array like so:

float readings[] = { 8.9, 8.6, 8.5, 8.7, 8.9, 8.8, 8.5 };

// Use our sizeof trick to get the number of elements
int count = sizeof readings / sizeof (float);
float total = 0.0;
float average;
for (int r = 0; r < count; r++) {
  total += readings[r];
}
average = total / count;
printf("The average reading is %0.2f\n", average);

This example highlights just how much C you have learned in only a few chapters! If you want some more practice, build this snippet into a complete program. Compile and run it to make sure you have it working. (The average should be 8.70, by the way.) Then add some more variables to capture the highest and lowest readings. Youâll need some if statements to help there. You can see one possible solution in arrays.c in the examples for this chapter.

Review of Strings

I have noted that strings are really just arrays of type char with some extra features supported by the language itself, such as literals. But since strings represent the easiest way to communicate with users, I want to highlight more of what you can do with strings in C.

Initializing strings

We have already seen how to declare and initialize a string. If you know the value of the string ahead of time, you can use a literal. If you donât know the value, you can still declare the variable and then use scanf() to ask the user what text to store. But what if you wanted to do both? Assign an initial default and then let the user supply an optional new value that overrides the default?

Happily, you can get there, but you do have to plan ahead a little. It might be tempting to use the default value when you first declare your variable, and then let the user provide a different value at runtime if they want. This works, but it requires an extra question to the user (âDo you want to change the background color, yes or no?â) and also assumes the user will supply a valid value as an alternative. Such assumptions are often safe as you are likely the only user while youâre learning a new language. But in programs you share with others, itâs better not to assume what the user will do.

String literals also make it tempting to think you can simply overwrite an existing string just like you can with int or float variables. But a string really is just a char[], and arrays are not assignable beyond the optional initialization when you declare them.

These limitations can all be overcome with the use of things like functions, which weâll explore in ChapterÂ 5. In fact, the need for the functions that make it possible to manipulate strings at runtime are so useful, they have been bundled up into their own library, which I cover in âstdlib.hâ.

For now, I want you to remember that string literals can make the initialization of character arrays simple and readable, but that at their heart, strings in C are not like numbers and individual characters.

Accessing individual characters

But I do want to reiterate that strings are just arrays. You can access individual characters in your string using the same syntax you use to access the members of any other array. For example, we could find out if a given phrase contains a comma by looking at each character in the phrase. Hereâs ch04/comma.c:

#include <stdio.h>

int main() {
  char phrase[] = "Hello, world!";
  int i = 0;
  // keep looping until the end of the string
  while (phrase[i] != '\0') {
    if (phrase[i] == ',') {
      printf("Found a comma at position %d.\n", i);
      break;
    }
    // try the next character
    i++;
  }
  if (phrase[i] == '\0') {
    // Rats. Made it to the end of the string without a match.
    printf("No comma found in %s\n", phrase);
  }
}

This program actually uses the array nature of the string a few times. Our loop condition depends on accessing a single character of the string just like the if condition that helps answer our original question. And we test an individual character at the very end to see if we found something or not. Weâll look at several string-related functions in ChapterÂ 7, but hopefully you see how you could accomplish things like copying or comparing strings using a good loop and the square brackets to march one character at a time through your array.

Multidimensional Arrays

It may not be obvious since strings are already an array, but you can store an array of strings in C. But because there is no âstring typeâ that you can use when declaring such an array, how do you do it? Turns out C supports the idea of a multidimensional array so you can create an array of char[] just like other arrays:

char month_names[][];

Seems fair. But what is not obvious in that declaration is what the pair of square bracket pairs refer to. When declaring a two-dimensional array like this, the first square bracket pair can be thought of as the row index, and the second is the column. Another way to think about it is the first index tells you how many character arrays weâll be storing and the second index tells you how long each of those arrays can be.

We know how many months there are and a little research tells us the longest name is September, with nine letters. Add on one more for our terminating null character, and we could precisely define our month_names array like this:

char month_names[12][11];

You could also initialize this two-dimensional array since we know the names of the months and donât require user input:

char month_names[12][11] = {
  "January", "February", "March", "April", "May", "June", "July",
  "August", "September", "October", "November", "December"
};

But here I cheated a little with the initialization by using string literals, so the second dimension of the month_names array isnât readily apparent. The first dimension is the months, and the second (hidden) dimension is the individual characters that make up the month names. If you are working with other data types that donât have this string literal shortcut, you can use nested curly brace lists like this:

int multiplication[5][5] = {
  { 0, 0, 0,  0,  0 },
  { 0, 1, 2,  3,  4 },
  { 0, 2, 4,  6,  8 },
  { 0, 3, 6,  9, 12 },
  { 0, 4, 8, 12, 16 }
};

It might be tempting to assume the compiler can determine the size of the multi-dimensional structure, but sadly, you must supply the capacity for each dimension beyond the first. For our month names, for example, we could start off without the â12â for how many names, but not without the â11â indicating the maximum length of any individual name:

// This shortcut is ok
char month_names[][11] = { "January", "February" /* ... */ };

// This shortcut is NOT
char month_names[][] = { "January", "February" /* ... */ };

Youâll eventually internalize these rules, but the compiler (and many editors) will always be there to catch you if you make a small mistake.

Accessing Elements in Multidimensional Arrays

With our array of month names, it is straightforward getting access to any particular month. It looks just like accessing the element of any other one-dimensional array:

printf("The name of the first month is: %s\n", month_names[0]);

// Output: The name of the first month is: January

But how would we access an element in the multiplication two-dimensional array? We use two indices:

printf("Looking up 3 x 4: %d\n", multiplication[3][4]);

// Output: Looking up 3 x 4: 12

Notice that in this multiplication table, the potentially strange use of zero as the first index value turns out to be a useful element. Index â0â gives us a rowâor columnâof valid multiplication answers.

And with two indices, youâll need two loops if you want to print out all of the data. We can take the work we did in âNested Loops and Tablesâ and use it to access our stored values rather than generating the numbers directly. Hereâs the printing snippet from ch04/print2d.c:

  for (int row = 0; row < 5; row++) {
    for (int col = 0; col < 5; col++) {
      printf("%3d", multiplication[row][col]);
    }
    printf("\n");
  }

And here is our nicely formatted table:

ch04$ gcc print2d.c
ch04$ ./a.out
  0  0  0  0  0
  0  1  2  3  4
  0  2  4  6  8
  0  3  6  9 12
  0  4  8 12 16

Weâll see some other options in ChapterÂ 6 for more tailored multidimensional storage. In the near term, just remember that you can create more dimensions with more pairs of square brackets. While youâll likely use one-dimensional arrays most of the time, tables are common enough and spatial data often fits in three-dimensional âcubes.â Few programmers will ever need it, especially those of us concentrating on microcontrollers, but C does support higher orders of arrays.

Storing Bits

Arrays allow us to store truly vast quantities of data with relative ease. At the other end of the spectrum, C has several operators that you can use to manipulate very small amounts of data. Indeed, you can work with the absolute smallest pieces of data: individual bits.

When C was developed in the 1970s, every byte of memory was expensive, and therefore precious. As I noted at the beginning of the chapter, if you had a particular variable that stored Boolean answers, using 16 bits for an int or even just 8 bits for a char would be a little wasteful. If you had an array of such variables, it could become very wasteful. Desktop computers these days can manage that type of waste without blinking an eye (or an LED), but our microcontrollers often need all the storage help they can get.

Binary, Octal, Hexadecimal

Before we tackle the operators in C that access and manipulate bits, letâs review some notation for discussing binary values. If we have a single bit, a 0 or a 1 are sufficient and thatâs easy enough. However, if we want to store a dozen bits inside one int variable, we need a way to describe the value of that int. Technically, the int will have a decimal (base 10) representation, but base 10 does not map cleanly to individual bits. For that, octal and hexadecimal notation is much clearer. (Binary, or base 2, notation would obviously be clearest, but large numbers get very long in binary. Octal and hexadecimalâoften just âhexââare a good compromise.)

When we talk about numbers, we often implicitly use base 10, thanks to the digits (ooh, get it?) on our hands. Computers donât have hands (discounting robots, of course) and donât count in base 10. They use binary. Two digits, 0 and 1, make up the entirety of their world. If you group three binary digits, you can represent the decimal numbers 0 through 7, which is eight total numbers, so this is base 8, or octal. Add a fourth bit and you can represent 0 through 15, which covers the individual âdigitsâ in hexadecimal. TableÂ 4-1 shows these first 16 values in all four bases.

Table 4-1. Numbers in decimal, binary, octal, and hexadecimal
Decimal	Binary	Octal	Hexadecimal	Decimal	Binary	Octal	Hexadecimal
`Â 0`	`0000 0000`	`000`	`0x00`	`Â 8`	`0000 1000`	`010`	`0x08`
`Â 1`	`0000 0001`	`001`	`0x01`	`Â 9`	`0000 1001`	`011`	`0x09`
`Â 2`	`0000 0010`	`002`	`0x02`	`10`	`0000 1010`	`012`	`0x0A / 0x0a`
`Â 3`	`0000 0011`	`003`	`0x03`	`11`	`0000 1011`	`013`	`0x0B / 0x0b`
`Â 4`	`0000 0100`	`004`	`0x04`	`12`	`0000 1100`	`014`	`0x0C / 0x0c`
`Â 5`	`0000 0101`	`005`	`0x05`	`13`	`0000 1101`	`015`	`0x0D / 0x0d`
`Â 6`	`0000 0110`	`006`	`0x06`	`14`	`0000 1110`	`016`	`0x0E / 0x0e`
`Â 7`	`0000 0111`	`007`	`0x07`	`15`	`0000 1111`	`017`	`0x0F / 0x0f`

You might notice that I always showed eight numbers for the binary column, three for octal, and two for hex. The byte (8 bits) is a very common unit to work with in C. Binary numbers often get shown in groups of four, with as many groups as required to cover the largest number being discussed. So for a full byte of 8 bits, which can store any value between 0 to 255, for example, you would see a binary value with two groupings of four digits. Similarly, octal values with three digits can display any byteâs value, and hexadecimal numbers need two digits. Note also that hexadecimal literals are not case sensitive. (Neither is the âxâ in the hexadecimal prefix, but an uppercase âXâ can be harder to distinguish.)

Weâll be using binary notation from time to time when working with microcontrollers in the latter half of this book, but you may have already run into hexadecimal numbers if you have written any styled text in HTML or CSS or similar markup languages. Colors in these documents are often represented with the hex values for a byte of red, a byte of green, a byte of blue, and occasionally a byte of alpha (transparency). So a full red that ignores the alpha channel would be FF0000. Now that you know two hex digits can represent one byte, it may be easier to read such color values.

To help you get accustomed to these different bases, try filling out the missing values in TableÂ 4-2. (You can check your answers with the TableÂ 4-4 table at the end of the chapter.) The numbers are not in any particular order, by the way. I want to keep you on your toes!

Table 4-2. Converting between bases
Decimal	Binary	Octal	Hexadecimal
`14`		`016`
	`0010` `0000`
		`021`	`11`
`50`			`32`
		`052`
			`13`
`167`
	`1111 1001`

Modern browsers can convert bases for you right in the search bar, so you probably wonât need to memorize the full 256 values possible in a byte. But it will still be useful if you can estimate the size of a hex value or determine if an octal ASCII code is probably a letter or a number.

Octal and Hexadecimal Literals in C

The C language has special options for expressing numeric literals in octal and hex. Octal literals start with a simple 0 as a prefix, although you can have multiple zeroes if you are keeping all of your values the same width, like we did in our base tables. For hex values, you use the prefix 0x or 0X. You typically match the case of the âXâ character to the case of any of the A-F digits in your hex value, but this is just a convention.

Hereâs a snippet showing how to use some of these prefixes:

int line_feed = 012;
int carriage_return = 015;
int red = 0xff;
int blue = 0x7f;

Some compilers support nonstandard prefixes or suffixes for representing binary literals, but as the ânonstandardâ qualifier suggests, they are not part of the official C language.

Input and Output of Octal and Hex Values

The printf() function has built-in format specifiers to help you produce octal or hexadecimal output. Octal value can be printed with the %o specifier and hex can be shown with either %x or %X, depending on whether you want lower- or uppercase output. These specifiers can be used with variables or expressions of any of the integer types in any base, which makes printf() a pretty easy way to convert from decimal to octal or hex. We could easily produce a table similar to TableÂ 4-1 (minus the binary column) using a loop and a single printf(). We can take advantage of the width and padding options of the format specifier to get our desired three octal digits and two hex digits. Take a look at ch04/dec_oct_hex.c:

#include <stdio.h>

int main() {
  printf(" Dec  Oct  Hex\n");
  for (int i = 0; i < 16; i++) {
    printf(" %3d  %03o  0x%02X\n", i, i, i);
  }
}

Notice that we just reuse the exact same variable for each of the three columns. Also notice that when printing the hexadecimal version, I manually added the â0xâ prefixâit is not included in the %x or %X formats. Here are a few of the first and last lines:

ch04$ gcc dec_oct_hex.c
ch04$ ./a.out
 Dec  Oct  Hex
   0  000  0x00
   1  001  0x01
   2  002  0x02
   3  003  0x03
 ...
  13  015  0x0D
  14  016  0x0E
  15  017  0x0F

Neat. Just the output we wanted. On the input side using scanf(), the format specifiers work in an interesting way. They are all still used to get numeric input from the user. The different specifiers now perform base conversion on the number you enter. If you specify decimal input (%d), you cannot use hex values. Conversely, if you specify hex input (%x or %X) and only enter numbers (i.e., you donât use any of the A-F digits), the number will still be converted from base 16.

Note

The specifiers %d and %i are normally interchangeable. In a printf() call, they will result in identical output. In a scanf() call, however, the %d option requires you to enter a simple base 10 number. The %i specifier allows you to use the various C literal perfixes to enter a value in a different base such as 0x to enter a hexadecimal number.

We can illustrate this with a simple converter program, ch04/rosetta.c, that will translate different inputs to all three bases on output. We can set which type of input we expect in the program but use an if/else if/else block to make it easy to adjust. (Although recompiling will still be required.)

#include <stdio.h>

int main() {
  char base;
  int input;

  printf("Convert from? (d)ecimal, (o)ctal, he(x): ");
  scanf("%c", &base);

  if (base == 'o') {
    // Get octal input
    printf("Please enter a number in octal: ");
    scanf("%o", &input);
  } else if (base == 'x') {
    // Get hex input
    printf("Please enter a number in hexadecimal: ");
    scanf("%x", &input);
  } else {
    // assume decimal input
    printf("Please enter a number in decimal: ");
    scanf("%d", &input);
  }
  printf("Dec: %d,  Oct: %o,  Hex: %x\n", input, input, input);
}

Here are a few example runs:

ch04$ gcc rosetta.c

ch04$ ./a.out
Convert from? (d)ecimal, (o)ctal, he(x): d
Please enter a number in decimal: 55
Dec: 55,  Oct: 67,  Hex: 37

ch04$ ./a.out
Convert from? (d)ecimal, (o)ctal, he(x): x
Please enter a number in hexadecimal: 37
Dec: 55,  Oct: 67,  Hex: 37

ch04$ ./a.out
Convert from? (d)ecimal, (o)ctal, he(x): d
Please enter a number in decimal: 0x37
Dec: 0,  Oct: 0,  Hex: 0

Interesting. The first two runs went according to plan. The third run didnât create an error but didnât really work, either. What happened here is a sort of âfeatureâ of scanf(). It tried very hard to bring in a decimal number. It found the character 0 in our input, which is a valid decimal digit, so it started parsing that character. But it next encountered the x character which is not valid for a base 10 number. So that was the end of the parsing and our program converted the value 0 into each of the three bases.

Try running this program yourself and switch the mode a few times. Do you get the behavior you expect? Can you cause any errors?

Knowing what we do about the difference between %i and other numeric specifiers in scanf(), can you see how to make this program a little simpler? It should be possible to accept any of the three bases for input without the big if statement. Iâll leave this problem to you as an exercise, but you can see one possible solution in the rosetta2.c file in the code examples for this chapter.

Bitwise Operators

Starting out on limited hardware like C did means occasionally working with data at the bit level quite apart from printing or reading in binary data. C supports this work with bitwise operators. These operators allow you to tweak individual bits inside int variables (or char or long, of course). Weâll see some fun uses of these features with the Arduino microcontroller in ChapterÂ 10.

TableÂ 4-3 describes these operators and shows some examples that make use of the following two variables:

char a = 0xD; // 1101 in binary
char b = 0x7; // 0111 in binary

Table 4-3. Bitwise operators in C
Operator	Name	Description	Example
&	bitwise and	Both bits must be 1 to yield a 1	a & b == 0101
\|	bitwise or	Either bit can be 1 to yield a 1	a \| b == 1111
!	bitwise not	Yields the opposite of the input bit	~a == 0010
^	bitwise xor	eXclusive OR, bits that donât match yield a 1	a ^ b == 1010
<<	left shift	Move bits to the left by a number of places	a << 3 == 0110 1000
>>	right shift	Move bits to the right by a number of places	b >> 2 == 0001

You can technically apply bitwise operators to any variable type to tweak particular bits. They are rarely used on floating point types, though. You usually pick an integral type that is big enough to hold however many individual bits you need. Because they are âeditingâ the bits of a given variable, you often see them used with compound assignment operators (op=). If you have five LEDs, for example, you could keep track of their on/off state with a single char type variable, as in this snippet:

char leds = 0;  // Start with everyone off, 0000 0000

leds |= 8;    // Turn on the 4th led from the right, 0000 1000
leds ^= 0x1f; // Toggle all lights, 0001 0111
leds &= 0x0f; // Turn off 5th led, leave others as is, 0000 0111

Five int or char values likely wonât make the difference in whether you can store or run a program on a microcontroller, even ones with only one or two kilobytes of memory, but those small storage needs do add up. If youâre tracking a panel of LEDs with hundreds or thousands of lights, it makes a difference how tightly you can store their state. One size rarely fits all, so remember your options and pick one that balances between ease of use and any resource constraints you have.

Mixing Bits and Bytes

We now have enough elements of C under our belts to start writing some really interesting code. We can combine all of our previous discussions on bits, arrays, types, looping, and branching to tackle a popular way of encoding binary data in text. One format for transmitting binary data through networks of devices with potentially limited resources is to convert it to simple lines of text. This is known as âbase64â encoding and is still used in things like inline email attachments for images. The 64 comes from the fact that this encoding uses 6-bit chunks, and 2 to the 6th power is 64. We use numbers, lowercase letters, uppercase letters, and other characters more or less arbitrarily chosen, typically the plus (+) and the forward slash (/).³

For this encoding, values 0 through 25 are the uppercase letters A through Z. Values 26 through 51 are the lowercase letters a through z. Values 52 through 61 are the digits 0 through 9, and finally, value 62 is the plus sign, and 63 is the forward slash.

But arenât bytes 8 bits long? Yes, they are. Thatâs exactly where all of our recent topics come into play! We can use this new knowledge to change those 8-bit chunks into 6-bit chunks.

FigureÂ 4-3 shows a small example of converting three bytes into a string of base64 text. These happen to be the first few bytes of a valid JPEG file, but you could work on any source you like. This is a fairly trivial bit of binary data, of course, but it will validate our algorithm.

We have nine bytes total to encode in our example, but really we just want to take things three bytes at a time, like the illustration, and repeat. Sounds like a job for a loop! We could use any of our loops, but weâll go with a for loop since we know where to start and end, and we can count up by threes. Weâll pull out three bytes from the source array into three variables, just for convenience of discussion.

unsigned char source[9] = { 0xd8,0xff,0xe0,0xff,0x10,0x00,0x46,0x4a,0x46 };
char buffer[4] = { 0, 0, 0, 0 };

for (int i = 0; i < 9; i += 3) {
  unsigned char byte1 = source[i];
  unsigned char byte2 = source[i + 1];
  unsigned char byte3 = source[i + 2];
  // ...
}

The next big step is getting the four 6-bit chunks into our buffer. We can use our bitwise operators to grab what we need. Look back at TableÂ 4-3. The leftmost six bits of byte1 make up our first 6-bit chunk. In this case, we can just shift those six bits to the right two slots:

  buffer[0] = byte1 >> 2;

Neat! One down, three to go. The second 6-bit chunk, though, is a little messy because it uses the two remaining bits from byte1 and four bits from byte2. There are several ways to do this, but weâll process the bits in order and just break up the assignment to the next slot in buffer into two steps:

  buffer[1] = (byte1 & 0x03) << 4;   
  buffer[1] |= (byte2 & 0xf0) >> 4;

: First, take the right two bits from byte1 and scoot them to the left four spaces to make room for the rest of our 6-bit chunk.
: Now, take the left four bits from byte2, scoot them to the right four spaces, and put them into buffer[1] without disturbing the upper half of that variable.

Halfway there! We can do something very similar for the third 6-bit chunk:

  buffer[2] = (byte2 & 0x0f) << 2;
  buffer[2] |= (byte3 & 0xc0) >> 6;

In this case, we take and scoot the right four bits of byte2 and scoot them over two slots to make room for the left two bits of byte3. But like before, we have to scoot those two bits all the way to the right first. Our last 6-bit chunk is another easy one. We just want the right six bits of byte4, no scooting required:

  buffer[3] = byte3 & 0x3f;

Hooray! We have successfully done the 3x8-bit to 4x6-bit conversion! Now we just need to print out each of the values in our buffer array. Sounds like another loop. And if you recall that we have five ranges for our base 64 âdigits,â that calls for a conditional of some sort. We could list out all 64 cases in a switch, but that feels tedious. (It would be very self-documenting, at least.) An if/else if chain should do nicely. Inside any particular branch, weâll do a little character math to get the correct value. As you read this next snippet, see if you can figure out how that character math is working its magic:

  for (int b = 0; b < 4; b++) {
    if (buffer[b] < 26) {
      // value 0 - 25, so uppercase letter
      printf("%c", 'A' + buffer[b]);
    } else if (buffer[b] < 52) {
      // value 26 - 51, so lowercase letter
      printf("%c", 'a' + (buffer[b] - 26));
    } else if (buffer[b] < 62) {
      // value 52 - 61, so a digit
      printf("%c", '0' + (buffer[b] - 52));
    } else if (buffer[b] == 62) {
      // our "+" case, no need for math, just print it
      printf("+");
    } else if (buffer[b] == 63) {
      // our "/" case, no need for math, just print it
      printf("/");
    } else {
      // Yikes! Error. We should never get here.
      printf("\n\n Error! Bad 6-bit value: %c\n", buffer[b]);
    }
  }

Does the character math make sense? Since char is an integer type, you can âaddâ to characters. If we add one to the character A, we get B. Add two to A and we get C, etc. For the lowercase letters and the digits, we first have to realign our buffered value so it is in a range starting at zero. The last two cases are easy, since we have one value that maps directly to one character. Hopefully, we never hit our else clause, but that is exactly what those clauses are for. If we got something wrong, print out a warning!

Whew! Those are some impressive moving parts. And if you want to build tiny devices that communicate with other tiny devices or the cloud, like a tiny security camera sending a picture to your phone, these are exactly the kind of moving parts youâll bump into.

Letâs assemble them in one listing (ch04/encode64.c) with the other bits we need for a valid C program:

#include <stdio.h>

int main() {
  // Manually specify a few bytes to encode for now
  unsigned char source[9] = { 0xd8,0xff,0xe0,0xff,0x10,0x00,0x46,0x4a,0x46 };
  char buffer[4] = { 0, 0, 0, 0 };

  // sizeof(char) == 1 byte, so the array's size in bytes is also its length
  int source_length = sizeof(source);
  for (int i = 0; i < source_length; i++) {
    printf("0x%02x ", source[i]);
  }
  printf("==> ");
  for (int i = 0; i < source_length; i += 3) {
    unsigned char byte1 = source[i];
    unsigned char byte2 = source[i + 1];
    unsigned char byte3 = source[i + 2];

    // Now move the appropriate bits into our buffer
    buffer[0] = byte1 >> 2;
    buffer[1] = (byte1 & 0x03) << 4;
    buffer[1] |= (byte2 & 0xf0) >> 4;
    buffer[2] = (byte2 & 0x0f) << 2;
    buffer[2] |= (byte3 & 0xc0) >> 6;
    buffer[3] = byte3 & 0x3f;

    for (int b = 0; b < 4; b++) {
      if (buffer[b] < 26) {
        // value 0 - 25, so uppercase letter
        printf("%c", 'A' + buffer[b]);
      } else if (buffer[b] < 52) {
        // value 26 - 51, so lowercase letter
        printf("%c", 'a' + (buffer[b] - 26));
      } else if (buffer[b] < 62) {
        // value 52 - 61, so a digit
        printf("%c", '0' + (buffer[b] - 52));
      } else if (buffer[b] == 62) {
        // our "+" case, no need for math, just print it
        printf("+");
      } else if (buffer[b] == 63) {
        // our "/" case, no need for math, just print it
        printf("/");
      } else {
        // Yikes! Error. We should never get here.
        printf("\n\n Error! Bad 6-bit value: %c\n", buffer[b]);
      }
    }
  }
  printf("\n");
}

As always, I encourage you to type in the program yourself, making any adjustments you want or adding any comments to help you remember what you learned. You can also compile the encode64.c file and then run it. Hereâs the output:

ch04$ gcc encode64.c
ch04$ ./a.out
0xd8 0xff 0xe0 0xff 0x10 0x00 0x46 0x4a 0x46  ==> 2P/g/xAARkpG

Very, very cool. Congratulations, by the way! That is a nontrivial bit of code there. You should be proud. But if you want to really test your skills, try writing your own decoder to reverse this process. If you start with the output above, do you get the original nine bytes? (You can check your answer against mine: ch04/decode64.c.)

Conversion Answers

Whether or not you tackle decoding the base64 encoded string, hopefully you tried converting the values in TableÂ 4-2 yourself. You can compare your answers here. Or use the rosetta.c program!

Table 4-4. Base conversion answers
Decimal	Binary	Octal	Hexadecimal
14	0000 1110	016	0E
32	0010 0000	040	20
17	0001 0001	021	11
50	0011 0010	062	32
42	0010 1010	052	2A
35	0001 0011	023	13
167	1010 0111	247	A7
249	1111 1001	371	F9

Next Steps

Câs support of simple arrays opens up a wide world of storage and retrieval options for just about any type of data. You do have to pay attention to the number of elements that you expect to use, but within those bounds, Câs arrays are quite efficient. And if you are only storing small, yes or no, on or off type values, C has several operators that make it possible to squeeze those values into the individual bits of a larger data type like an int. Modern desktops rarely require that much attention to detail, but some of our Arduino options in the latter half of this book care very much!

So whatâs next? Well, our programs are getting interesting enough that weâll want to start breaking the logic up into manageable slices. Think about this book, for example. It is not made up of one, excessive run-on sentence. It is broken into chapters. Those chapters, in turn, are broken into sections. Those sections are broken into paragraphs. It is usually easier to discuss a single paragraph than it is an entire book. C allows you to perform this type of breakdown for your own logic. And once you have the logic in digestible blocks, you can use those blocks just like we have been doing with the printf() and scanf() functions. Letâs dive in!

¹ Exactly how things go wrong may vary. Your operating system or version, compiler version, or even the conditions on your system at runtime can all affect the output. The point is to be careful not to overflow your arrays.

² The gcc stack-protector option can be used to detect some buffer overflows and abort the program before the overflow can be used maliciously. This is a compile-time flag that is off by default.

³ As an example of an alternative pair of extra characters, the base64url variation uses a minus (â-â) and underscore (â_â).

Get Smaller C now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Smaller C by Marc Loy

Chapter 4. Bits and (Many) Bytes

Storing Multiple Things with Arrays

Creating and Manipulating Arrays

Figure 4-1. An empty array of type `char` called `name`

Figure 4-2. A char array with a string

Note

Length versus capacity

Warning

Initializing arrays

Accessing array elements

Warning

Changing array elements

Iterating through arrays

Review of Strings

Initializing strings

Accessing individual characters

Multidimensional Arrays

Accessing Elements in Multidimensional Arrays

Storing Bits

Binary, Octal, Hexadecimal

Octal and Hexadecimal Literals in C

Input and Output of Octal and Hex Values

Note

Bitwise Operators

Mixing Bits and Bytes

Figure 4-3. Going from 8-bit to 6-bit chunks with encoding

Conversion Answers

Next Steps

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Chapter 4. Bits and (Many) Bytes

Storing Multiple Things with Arrays

Creating and Manipulating Arrays

Figure 4-1. An empty array of type char called name

Figure 4-2. A char array with a string

Note

Length versus capacity

Warning

Initializing arrays

Accessing array elements

Warning

Changing array elements

Iterating through arrays

Review of Strings

Initializing strings

Accessing individual characters

Multidimensional Arrays

Accessing Elements in Multidimensional Arrays

Storing Bits

Binary, Octal, Hexadecimal

Octal and Hexadecimal Literals in C

Input and Output of Octal and Hex Values

Note

Bitwise Operators

Mixing Bits and Bytes

Figure 4-3. Going from 8-bit to 6-bit chunks with encoding

Conversion Answers

Next Steps

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

Figure 4-1. An empty array of type `char` called `name`