Chapter 1. Introduction to Computer Architecture

Each machine has its own, unique personality which probably could be defined as the intuitive sum total of everything you know and feel about it. This personality constantly changes, usually for the worse, but sometimes surprisingly for the better . . .

Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance

This book is about designing and building specialized computers. We all know what a computer is. It’s that box that sits on your desk, quietly purring away (or rattling if the fan is shot), running your programs and regularly crashing (if you’re not running some variety of Unix). Inside that box is the electronics that runs your software, stores your information, and connects you to the world. It’s all about processing information. Designing a computer, therefore, is about designing a machine that holds and manipulates data.

Computer systems fall into essentially two separate categories. The first, and most obvious, is that of the desktop computer. When you say “computer” to someone, this is the machine that usually comes to his mind. The second type of computer is the embedded computer, a computer that is integrated into another system for the purposes of control and/or monitoring. Embedded computers are far more numerous than desktop systems, but far less obvious. Ask the average person how many computers she has in her home, and she might reply that she has one or two. In fact, she may have 30 or more, hidden inside her TVs, VCRs, DVD players, remote controls, cell phones, ovens, toys, and a host of other devices. In this chapter, we’ll look at computer architecture in general, which applies to both embedded and desktop computers.

The underlying architectures of desktop computers and embedded computers are fundamentally the same. At a crude level, both have a processor, memory, and some form of input and output. The primary difference lies in their intended use, and this is reflected in their software. Desktop computers can run a variety of application programs, with system resources orchestrated by an operating system. By running different application programs, the functionality of the desktop computer is changed. One moment, it may be used as a word processor; the next, it is an MP3 player or a database client. Which software is loaded and run is under user control.

In contrast, the embedded computer is normally dedicated to a specific task. The advantage of using an embedded microprocessor over dedicated electronics is that the functionality of the system is determined by the software, not the hardware. It typically has one application and one application only, and this is permanently running. The embedded computer may or may not have an operating system, and rarely does it provide the user with the ability to arbitrarily install new software. The software is normally contained in the system’s nonvolatile memory, unlike a desktop computer in which the nonvolatile memory contains boot software and (maybe) low-level drivers only.

Embedded hardware is often much simpler than a desktop system, but it can also be far more complex too. An embedded computer may be implemented in a single chip with just a few support components, and its purpose may be as crude as a controller for a garden-watering system. Or the embedded computer may be a 150-processor, distributed parallel machine responsible for all the flight and control systems of a commercial jet. As diverse as embedded hardware may be, the underlying principles of design are the same.

This chapter introduces some important concepts relating to computer architecture, with specific emphasis on those topics relevant to embedded systems. Its purpose is to give you grounding before moving on to the more hands-on information that begins in Chapter 2. In this chapter, you’ll learn about the basics of processors, interrupts, the difference between RISC and CISC, parallel systems, memory, and I/O.


At the simplest level, a computer is a machine designed to process, store, and retrieve data. Data may be numbers in a spreadsheet, characters of text in a document, dots of color in an image, waveforms of sound, or the state of some system, such as an air conditioner or a CD player. It is important to note that all data is stored in the computer as numbers.

The computer manipulates the data by performing operations on the numbers. Displaying an image on a screen is accomplished by moving an array of numbers to the video memory, each number representing a pixel of color. To play an MP3 audio file, the computer reads an array of numbers from disk and into memory, manipulates those numbers to convert the compressed audio data into raw audio data, and then outputs the new set of numbers (the raw audio data) to the audio chip.

Everything that a computer does, from web browsing to printing, involves moving and processing numbers. The electronics of a computer is nothing more than a system designed to hold, move, and change numbers.

A computer system is composed of many parts, both hardware and software. At the heart of the computer is the processor, the hardware that executes the computer programs. The computer also has memory, often several different types in the one system. The memory is used to store programs while the processor is running them, as well as to store data that the programs are manipulating. The computer also has devices for storing data or exchanging data with the outside world. These may allow the input of text via a keyboard, the display of information on a screen, or the movement of programs and data to or from a disk drive.

The software controls the operation and functionality of the computer. There are many “layers” of software in the computer (Figure 1-1). Typically, a given layer will interact with only the layer immediately above or below.

Software layers

Figure 1-1. Software layers

At the lowest level are programs that are run by the processor when the computer first powers up. These programs initialize the other hardware subsystems to a known state and configure the computer for correct operation. This software, because it is permanently stored in the computer’s memory, is known as firmware.

The bootloader is located in the firmware. The bootloader is a special program run by the processor that reads the operating system from disk (or nonvolatile memory or network) and places it in memory so that the processor may then run it. The bootloader is present in desktop computers and workstations and may also be present in some embedded computers.

Above the firmware, the operating system controls the operation of the computer. It organizes the use of memory; controls devices such as the keyboard, mouse, screen, disk drives; and so on. It is also the software that often provides an interface to the user, enabling him to run application programs and access his files on disk. The operating system also provides a set of software tools for application programs, providing a mechanism by which they too can access the screen, disk drives, and so on. Not all embedded systems use or even need an operating system. Often, an embedded system will simply run code dedicated to its task, and the presence of an operating system is overkill. In other instances, such as network routers, an operating system provides necessary software integration and greatly simplifies the development process. Whether an operating system is needed and useful really depends on the intended purpose of the embedded computer and, to a lesser degree, on the preference of the designer.

At the highest level, the application constitutes the programs that provide the functionality of the computer. Everything below the application is considered system software. For embedded computers, the boundary between application and system software is often blurred. This reflects the underlying principle in embedded design that a system should be designed to achieve its objective in as simple and straightforward a manner as possible.


The processor is the most important part of a computer, the component around which everything else is centered. In essence, the processor is the computing part of the computer. A processor is an electronic device capable of manipulating data (information) in a way specified by a sequence of instructions. The instructions are also known as opcodes or machine code. This sequence of instructions may be altered to suit the application; hence, computers are programmable. The sequence of instructions is what constitutes a program.

Instructions in a computer are numbers, just like data. Different numbers, when read and executed by a processor, cause different things to happen. A good analogy is the mechanism of a music box. A music box has a rotating drum with little bumps and a row of prongs. As the drum rotates, different prongs in turn are activated by the bumps, and music is produced. In a similar way, the bit patterns of instructions feed into the execution unit of the processor. Different bit patterns activate or deactivate different parts of the processing core. Thus, the bit pattern of a given instruction may activate an addition operation, while another bit pattern may cause a byte to be stored to memory.

A sequence of instructions is a machine-code program. Each type of processor has a different instruction set, meaning that the functionality of the instructions (and the bit patterns that activate them) vary. Processor instructions are often quite simple, such as “add two numbers” or “call this function.” In some processors, however, they can be as complex and sophisticated as “if the result of the last operation was zero, then use this particular number to reference another number in memory, and then increment the first number once you’ve finished.” This will be covered in more detail in Section 1.1.4, later in this chapter.

A program that a given processor may execute might look something like:

B0 4F F7 01 00 07...

Humans find such programs very hard to write and even harder to understand. To make this easier for us to use, we use a notation called assembly language, in which mnemonics are used to represent the opcodes. Assembly language instructions equate directly to their machine-code counterparts.

For example, the instruction B04FF7 is more easily understood by its assembly language mnemonic ADD.B #0xFF, W7. This is still a bit cryptic, so we usually add comments on the righthand side to help us follow what is going on.

So, the preceding machine code written in assembly would be:

ADD.B #0xFF, W7    ; Add the byte -1 to register W7
CALL W7            ; call the subroutine pointed to by W7

Different processor families use different assembly languages. No two are alike, although some degree of similarity may be present. The previous examples are written in assembly language for the dsPIC processor. Other assembly languages, because they are based on very different processor hardware, have very different syntax. This is not of great importance to this book; just be aware that different processors use very different code.

No computer can understand assembly directly. Back in the olden days, when computers were steam-driven and tended by gnomes, software was compiled manually. Each instruction mnemonic was looked up and converted to the appropriate opcode by the programmer. While it is certainly character building, converting from assembly to opcodes is very tiresome, particularly with large programs. To make life easier, special compilers, called assemblers, take mnemonics and convert them to opcodes.

Assembly language has been described as the “nuts-and-bolts language,” for you are writing code directly for the processor. For a lot of the software you will write, a high-level language like C will be the language of choice. High-level languages make developing software much easier, and your code is also portable (to a degree) between different target machines. Compilers of high-level languages convert your source code down to machine opcodes. Thus, by using a compiler, the programmer is relieved of having to know the specific details of the processor and of having to code her program directly in machine code.

So there are good reasons for using a high-level language. Yet, many times programmers write directly in assembly language. Why? Assembly and machine code, because they are “handwritten,” can be finely tuned to get the most performance out of the processor and computer hardware. This can be particularly important when dealing with time-critical operations with I/O devices. Further, coding directly in assembly can sometimes (but not always) result in a smaller code space. So, if you’re trying to cram complex software into a small amount of memory and need that software to execute quickly and efficiently, assembly language may be your best (and only) choice. The drawback, of course, is that the software is harder to maintain and has zero portability to other processors. A good programmer can create more efficient code than the average C compiler; however, a good C compiler will probably produce tighter code than a mediocre programmer. Typically, you can include inline assembly within your C code and thereby get the best of both worlds.

At the mere mention of assembly language, many a die-hard programmer begins to quiver in fear, as if just invited into a tiger’s cage. But assembly-language programming is not that hard and can often be a lot of fun. Think of it as being “as one” with the processor.

That said, this is a book about hardware, not software. Embedded software development is already covered by two O’Reilly & Associates books: Programming Embedded Systems in C and C++, by Michael Barr, and Programming with GNU Software, by Mike Loukides and Andy Oram.

When you’re developing your embedded system, it is best to start with a development kit from the processor’s manufacturer. A good development kit will not only provide you with a working example of the machine you’re trying to build (and upon which you can test your code), it should also include a nice Integrated Development Environment (or IDE). The IDE will have a windowing editor, a debugger, a simulator too if you’re lucky, an assembler, and hopefully a C compiler as well. The kit should also come with cables and tools for programming the processor and circuit schematics so you can see what a working machine should look like. Treat the schematics with a small degree of caution. Some (but not all) semiconductor manufacturers farm out the design of their development systems to small, external companies. Some of these companies do a fantastic job, while others seem to employ stray chimpanzees as design engineers. In the latter case, the development system will work, but only through a miracle and by the grace of the digital gods. So, treat the schematics as a rough guide only.

To use the IDE, you will need a desktop computer. And here’s the bad news. Almost without exception, the IDEs will run on only one platform and under only one operating system. No prizes for guessing which one. So, if your preferred environment is a Unix workstation, generally you’re out of luck. While the GNU tools are great, sometimes you just have to resort to the IDE to download code into your target computer, particularly for 8- and 16-bit processors.

Development kit prices range from free (if you’re at the right place at the right time) to many tens of thousands of dollars for some of the really high-end and exotic processors. For most embedded-type processors, you could expect to pay somewhere between $50 and $300, depending on the chip, the manufacturer, and its current whim. The time a development kit will save you probably makes the investment worthwhile.

System Architecture

The processor alone is incapable of successfully performing any tasks. It requires memory (for program and data storage), support logic, and at least one I/O device (input/output device) used to transfer data between the computer and the outside world. The basic computer system is shown in Figure 1-2.

Basic computer system

Figure 1-2. Basic computer system

A microprocessor is a processor implemented (usually) on a single, integrated circuit. With the exception of those found in some large supercomputers, nearly all modern processors are microprocessors, and the two terms are often used interchangeably. Common microprocessors in use today are the Intel Pentium series, Motorola/IBM PowerPC, MIPS, ARM, and Sun SPARC. A microprocessor is sometimes also known as a CPU (Central Processing Unit).

A microcontroller is a processor, memory, and some I/O contained within a single, integrated circuit and intended for use in embedded systems. The buses that interconnect the processor with its I/O exist within the same integrated circuit. The range of available microcontrollers is very broad. They range from the tiny PICs and AVRs (to be covered in this book), to PowerPC processors with built-in I/O, intended for embedded applications.

Microcontrollers are very similar to System-On-Chip (SOC) processors, intended for use in conventional computers such as PCs and workstations. SOC processors have a different suite of I/O, reflecting their intended application, and are designed to be interfaced to large banks of external memory. Microcontrollers usually have all their memory on-chip and may provide only limited support for external memory devices.

The memory of the computer system contains both the instructions that the processor will execute and the data it will manipulate. The memory of a computer system is never empty. It always contains something, whether it be instructions, meaningful data, or just the random garbage that appeared in the memory when the system powered up.

Instructions are read (fetched) from memory, while data is both read from and written to memory, as shown in Figure 1-3.

Data flow

Figure 1-3. Data flow

This form of computer architecture is known as a Von Neumann machine, named after John von Neumann, one of the originators of the concept. With very few exceptions, nearly all modern computers follow this form. Von Neumann computers can be termed control-flow computers. The steps taken by the computer are governed by the sequential control of a program. In other words, the computer follows a step-by-step program that governs its operation. (There are some interesting non-Von Neumann architectures, such as the massively parallel “Connection Machine” and the nascent efforts at building biological and quantum computers, or neural networks.)

A classical Von Neumann machine has several distinguishing characteristics:

There is no real difference between data and instructions.

A processor can be directed to begin execution at a given point in memory, and it has no way of knowing whether the sequence of numbers beginning at that point is data or instructions. The instruction 0x4143 may also be data (the number 0x4143 or the ASCII characters “A” and “C”). The processor has no way of telling what is data or what is an instruction. If a number is to be executed by the processor, it is an instruction; if it is to be manipulated, it is data.

Because of this lack of distinction, the processor is capable of changing its instructions (treating them as data) under program control. And because the processor has no way of distinguishing between data and instruction, it will blindly execute anything that it is given, whether it is a meaningful sequence of instructions or not.

Data has no inherent meaning.

There is nothing to distinguish between a number that represents a dot of color in an image and a number that represents a character in a text document. Meaning comes from how those numbers are treated under the execution of a program.

Data and instructions share the same memory.

This means that sequences of instructions in a program may be treated as data by another program. A compiler creates a program binary by generating a sequence of numbers (instructions) in memory. To the compiler, the compiled program is just data, and it is treated as such. It is a program only when the processor begins execution. Similarly, an operating system loading an application program from disk does so by treating the sequence of instructions of that program as data. The program is loaded to memory just as an image or text file would be, and this is possible due to the shared memory space.

Memory is a linear (one-dimensional) array of storage locations.

The memory space of the processor may contain the operating system, various programs, and their associated data, all within the same linear space.

Each location in the memory space has a unique, sequential address. The address of a memory location is used to specify (and select) that location. The memory space is also known as the address space, and how that address space is partitioned between different memory and I/O devices is known as the memory map.

Some processors, notably the Intel x86 family, have a separate address space for I/O devices, with separate instructions for accessing this space. This is known as ported I/O. However, most processors make no distinction between memory devices and I/O devices within the address space. I/O devices exist within the same linear space as memory devices, and the same instructions are used to access each. This is known as memory-mapped I/O (Figure 1-4). Memory-mapped I/O is certainly the most common form. Ported I/O address spaces are becoming rare, and the use of the term even rarer.

Ported versus memory-mapped I/O spaces

Figure 1-4. Ported versus memory-mapped I/O spaces

Most microprocessors available are standard Von Neumann machines. The main deviation from this is the Harvard architecture, in which instructions and data have different memory spaces (Figure 1-5), with separate address, data, and control buses for each memory space. This has a number of advantages in that instruction and data fetches can occur concurrently, and the size of an instruction is not set by the size of the standard data unit (word).

Harvard architecture

Figure 1-5. Harvard architecture


A bus is a physical group of signal lines that have a related function. Buses allow for the transfer of electrical signals between different parts of the computer system and thereby transfer information from one device to another. For example, the data bus is the group of signal lines that carry data between the processor and the various subsystems that constitute the computer. The width of a bus is the number of signal lines dedicated to transferring information. For example, an 8-bit-wide bus transfers 8 bits of data in parallel.

The majority of microprocessors available today (with some exceptions) use the three-bus system architecture (Figure 1-6). The three buses are the address bus, the data bus, and the control bus.

Three-bus system

Figure 1-6. Three-bus system

The data bus is bidirectional, the direction of transfer being determined by the processor. The address bus carries the address, which points to the location in memory that the processor wishes to access. It is up to external circuitry to determine in which external device a given memory location exists and to activate that device. This is known as address decoding. The control bus carries information from the processor about the state of the current access, such as whether it is a write or a read operation. The control bus can also carry information back to the processor regarding the current access, such as an address error. Different processors have different control lines, but some control lines are common among many processors. The control bus may consist of output signals such as read, write, valid address, and so on. A processor has several input control lines too, such as RESET, one or more interrupt lines, and a clock input.


A few years ago, I had the opportunity to wander through, in, and around CSIRAC (pronounced “sigh-rack”). This was one of the world’s first digital computers, designed and built in Sydney, Australia, in the late 1940s. It was a massive machine, filling a very big room with the type of solid hardware that you can really kick. It was quite an experience looking over the old machine. I remember at one stage walking through the disk controller (it was the size of a small room) and looking up at a mass of wires strung overhead. I asked what they were for. “That’s the data bus!” came the reply.

CSIRAC is now housed in the museum of the University of Melbourne. You can take an online tour of the machine, and even download a simulator, at

Processor operation

There are six basic functions that a processor can perform. The processor can write data to system memory or write data to an I/O device; it can read data from system memory or read data from an I/O device; it can read instructions from system memory; and it can perform internal manipulation of data within the processor.

In many systems, writing data to memory is functionally identical to writing data to an I/O device. Similarly, reading data from memory constitutes the same external operation as reading data from an I/O device or reading an instruction from memory. In other words, the processor makes no distinction between memory and I/O.

The internal data storage of the processor is known as its registers. The processor has a limited number of registers, and these are used to contain the current data/operands that the processor is manipulating.


The Arithmetic Logic Unit (ALU) performs the internal arithmetic manipulation of data in the processor. The instructions read and executed by the processor control the data flow between the registers and the ALU, as well as operations performed by the ALU, via the ALU’s control inputs. A symbolic representation of an ALU is shown in Figure 1-7.

ALU block diagram

Figure 1-7. ALU block diagram

Whenever instructed by the processor, the ALU performs an operation (typically one of addition, subtraction, multiplication, division, NOT, AND, NAND, OR, NOR, XOR, shift left/right, or rotate left/right) on one or more values. These values, called operands, are typically obtained from two registers or from one register and a memory location. The result of the operation is then placed back into a given destination register or memory location. The status outputs indicate any special attributes about the operation, such as whether the result was zero or negative or if an overflow or carry occurred. Some processors have separate units for multiplication and division and for bit shifting, providing faster operation and increased throughput.

Each architecture has its own unique ALU features, which can vary greatly from one processor to another. However, all are just variations on a theme and all share the common characteristics just described.


Registers are the internal (working) storage for the processor. The number of registers varies significantly between processor architectures. Typically, the processor will have one or more accumulators. These are registers that may have arithmetic operations performed upon them. In some architectures, all the registers function as accumulators, whereas in others, some registers are dedicated for storage only and have limited functionality.

Some processors have index registers that can function as pointers into the memory space. In some architectures, all general-purpose registers can act as index registers; in others, dedicated index registers exist.

All processors will have a program counter (also known as an instruction pointer) that tracks the location in memory of the next instruction to be fetched and executed. All processors have a status register (also known as a condition-code register, or CCR) that consists of various status bits (flags) that reflect the current operational state. Such flags might indicate whether the result of the last operation was zero or negative, whether a carry occurred, if an interrupt is being serviced, and so on.

Some processors also have one or more control registers, consisting of configuration bits that affect processor operation and the operating modes of various internal subsystems. Many peripherals also have registers that control their operation and registers that contain the results of operations. These peripheral registers are normally mapped into the address space of the processor.

Some processors have banks of shadow registers, which save the state of the main registers when the processor begins servicing an interrupt (to be discussed shortly).

Processors are commonly 8-bit, 16-bit, 32-bit, or 64-bit, referring to the width of their registers. An 8-bit processor is invariably low-cost and is suitable for relatively simple control and monitoring applications. If more processing power is required, the larger processors are preferable, although cost and system complexity go up accordingly.


Many processors implement one or more stacks, which serve as temporary storage in external memory. The processor can push a value from a register on the stack to preserve it for later use. The processor retrieves this value by popping from the stack back into a register. In some processor architectures, popping is also known as pulling.

Most processors have a stack pointer, which references the next free location on the stack. Some processors implement more than one stack and so have more than one stack pointer. Most stacks grow down through memory. (Some processors have stacks that grow up as the stack is filled.) When the processor pushes or pops a value to or from the stack, the stack pointer automatically decrements (or increments) to point to the next free location.

Addressing modes

The different ways in which an instruction can reference a register or memory location are known as the addressing modes of the processor. The types of addressing modes available within different architectures vary, but the basic ones are as follows:


The instruction deals purely with registers.


The instruction has a literal number as an operand.


The instruction accesses a memory location, specified by a short address. In other words, direct addressing provides access to a subset of the total address space. On a processor with a 16-bit address bus, a direct access would specify an address within the first 256 bytes. On a 32-bit processor, a direct access may specify an address within the first 64K of memory, for example. Direct addressing is used (when possible) to reduce the length of instructions referencing memory. This can reduce code size and therefore instruction fetch time in time-critical applications.


The instruction accesses a memory location, specified by the full address.


The instruction uses the contents of a register as a pointer into memory.


An offset is specified as part of the addressing. For example, a branch instruction uses relative addressing to add (or subtract) a value from the program counter.

Big-endian and little-endian

Microprocessors are either big endian or little endian in their architecture. This refers to the way in which the processor stores data (16 bits or greater) to memory. A big-endian processor stores the most significant byte at the least significant address, as illustrated in Figure 1-8. In each case, the data has been stored to address 0x0100.

Big endian

Figure 1-8. Big endian

A little-endian processor stores the most significant byte at the most significant address, as shown in Figure 1-9.

Little endian

Figure 1-9. Little endian

With the little-endian scheme, the least significant data travels over the least significant part of the data bus and is stored at the least significant memory location. In other words, for a programmer, it is conceptually easier to understand in terms of data path. The disadvantage of little endian is that data appears backward in the computer’s memory. Storing the value 0x12345678 to memory results in 0x78563412 in the memory space. Note that a little-endian processor will read this data back correctly; it’s just that it makes it harder to understand the numbers if a human is looking at the memory directly. Alternatively, a big-endian processor storing 0x12345678 to memory results in 0x12345678 sitting inside the memory chip. This appears (to a human) to make more sense. Neither scheme has much advantage over the other in terms of operation; they are just two different ways of doing the same thing. When you’re doing high-level programming on a system, the “endian-ness” makes little difference, for you are rarely exposed to it. However, when you are developing and debugging hardware and low-level firmware, you come across it all the time, so an understanding of big endian and little endian is important.

Interrupt s

Interrupts (also known as traps or exceptions in some processors) are a technique of diverting the processor from the execution of the current program so that it may deal with some event that has occurred. Such an event may be an error from a peripheral or simply that an I/O device has finished the last task it was given and is now ready for another. An interrupt is generated in your computer every time you press a key or move the mouse. Interrupts alleviate the processor from having to continuously check the I/O devices to determine whether they require service. Instead, the processor may continue with other tasks. The I/O devices will notify it if and when they require attention by asserting one of the processor’s interrupt inputs. Interrupts can be of varying priorities in some processors, thereby assigning differing importance to the events that can interrupt the processor. If the processor is servicing a low-priority interrupt, it will pause that in order to service a higher-priority interrupt. However, if the processor is servicing an interrupt and a second, lower-priority interrupt occurs, the processor will ignore that interrupt until it has finished the higher-priority service.

When an interrupt occurs, the processor saves its state by pushing its registers and program counter onto the stack. The processor then loads an interrupt vector into the program counter. The interrupt vector is the address at which an Interrupt Service Routine (ISR) lies. Thus, loading the vector into the program counter causes the processor to begin execution of the ISR, performing whatever service the interrupting device required. The last instruction of an ISR is always a Return from Interrupt instruction. This causes the processor to reload its saved state (registers and program counter) from the stack and resume its original program. Interrupts are largely transparent to the original program. This means that the original program is completely “unaware” that the processor was interrupted, save for a lost interval of time.

Processors with shadow registers use these to save their current state, rather than pushing their register bank onto the stack. This saves considerable memory accesses (and therefore time) when processing an interrupt. However, since only one set of shadow registers exists, a processor servicing multiple interrupts must “manually” preserve the state of the registers before servicing the higher interrupt. If it does not, important state information will be lost. Upon returning from an ISR, the contents of the shadow registers are swapped back into the main register array.

Hardware interrupts

There are two ways of telling when an I/O device (such as a serial controller or a disk controller) is ready for the next sequence of data to be transferred. The first is busy waiting or polling, when the processor continuously checks the device’s status register until the device is ready. This is fairly wasteful of the processor’s time but is the simplest to implement.

A better way is for the device to generate an interrupt to the processor when it is ready for a transfer to take place. Small, simple processors may have only one (or two) interrupt input, so several external devices may have to share the interrupt lines of the processor. When an interrupt occurs, the processor must check each device to determine which one generated the interrupt. (This can also be considered a form of polling.) The advantage of interrupt polling over ordinary polling is that the polling occurs only when there is a need to service a device. Polling interrupts is suitable only in systems that have a small number of devices; otherwise, the processor will spend too long trying to determine the source of the interrupt.

The other technique of servicing an interrupt is by using vectored interrupts, by which the interrupting device is able to specify which interrupt vector the processor is to execute. Vectored interrupts considerably reduce the time it takes the processor to determine the source of the interrupt. If an interrupt request can be generated from more than one source, it is therefore necessary to assign priorities (levels) to the different interrupts. This can be done in either hardware or software, depending on the particular application. In this scheme, the processor has numerous interrupt lines with each interrupt corresponding to a given interrupt vector. So, for example, when an interrupt of priority 7 occurs (interrupt lines corresponding to 7 are asserted), the processor loads vector 7 into its program counter and starts executing the service routine specific for interrupt 7.

Vectored interrupts can be taken one step further. Some processors and devices support the device actually placing the appropriate vector onto the data bus when they generate an interrupt. This means the system can be even more versatile, so that instead of being limited to one interrupt per peripheral, each device can supply an interrupt vector specific for the event that is causing the interrupt. However, the processor must support this feature, and most do not.

Some processors have a feature known as a fast hardware interrupt . With this interrupt, only the program counter is saved. It assumes that the ISR will protect the contents of the registers by manually saving their state as required. Fast interrupts are useful when an I/O device requires a very fast response from a processor and cannot wait for the processor to save all its registers to the stack. A special (and separate) interrupt line is used to generate fast interrupts.

Software interrupts

A software interrupt is an interrupt generated by an instruction. It is the lowest priority interrupt and is generally used by programs to request a service to be performed for it by the system software (operating system or firmware).

So why are software interrupts used? Why isn’t the appropriate section of code called directly? For that matter, why use an operating system to perform tasks for us at all? It gets back to compatibility. Jumping to a subroutine is jumping to a specific address. A future version of the system software may not locate the subroutines at the same addresses as earlier versions. By using a software interrupt, our program does not need to know where the routines lie. It relies on the entry in the vector table to direct it to the correct location.


There are two major approaches to processor architecture: Complex Instruction Set Computer (CISC, pronounced “sisk”) processors and Reduced Instruction Set Computer (RISC) processors. Classic CISC processors are the Intel x86, Motorola 68xxx, and National Semiconductor 32xxx processors and, to a lesser degree, the Intel Pentium. Common RISC architectures are the Motorola/IBM PowerPC, the MIPS architecture, Sun’s SPARC, the ARM, the ATMEL AVR, and the Microchip PIC.

CISC processors have a single processing unit, external memory, a relatively small register set, and many hundreds of different instructions. In many ways, they are just smaller versions of the processing units of mainframe computers from the 1960s.

The tendency in processor design throughout the late ’70s and early ’80s had been toward bigger and more complicated instruction sets. Need to input a string of characters from an I/O port? Well, with CISC (80x86 family), there’s a single instruction to do it! The diversity of instructions in a CISC processor can easily exceed a thousand opcodes in some processors, such as the Motorola 68000. This had the advantage of making the job of the assembly-language programmer easier—you had to write fewer lines of code to get the job done. Since memory was slow and expensive, it also made sense to make each instruction do more. This reduced the number of instructions needed to perform a given function and thereby reduced memory space and the number of memory accesses required to fetch instructions. As memory got cheaper and faster and compilers became more efficient, the relative advantages of the CISC approach began to diminish. One main disadvantage of CISC is that the processors themselves get increasingly complicated, as a consequence of supporting such a large and diverse instruction set. The control and instruction decode units are complex and slow; the silicon is large and hard to produce; they consume a lot of power and therefore generate a lot of heat. As processors became more advanced, the overheads that CISC imposed on the silicon became oppressive.

A given processor feature when considered alone may increase processor performance but may actually decrease the performance of the total system, if it increases the total complexity of the device. It was found that by streamlining the instruction set to the most commonly used instructions, the processors became simpler and faster. Fewer cycles are required to decode and execute each instruction, and the cycles are shorter. The drawback is that more (simpler) instructions are required to perform a task, but this is more than made up for in the performance boost to the processor. For example, if both cycle time and the number of cycles per instruction are reduced by a factor of 4 each, while the number of instructions required to perform a task grows by 50%, the execution of the processor is sped up by a factor of 8.

The realization of this led to a rethinking of processor design. The result was the RISC architecture, which has led to the development of very high performance processors. The basic philosophy behind RISC is to move the complexity from the silicon to the language compiler. The hardware is kept as simple and fast as possible.

A given complex instruction can be performed by a sequence of much simpler instructions. For example, many processors have an xor (exclusive OR) instruction for bit manipulation, and they also have a clear instruction to set a given register to zero. However, a register can also be set to zero by xor-ing it with itself. Thus, the separate clear instruction is no longer required. It can be replaced with the already-present xor. Further, many processors are able to clear a memory location directly, by writing zeros to it. That same function can be implemented by clearing a register and then storing that register to the memory location. The instruction to load a register with a literal number can be replaced with clearing a register, followed by an add instruction with the literal number as its operand. Thus, six instructions (xor, clear reg, clear memory, load literal, store, and add) can be replaced with just three (xor, store, and add).

So the following CISC assembly pseudocode:

clear 0x1000    ; clear memory location 0x1000
load  r1,#5     ; load register 1 with the value 5

becomes the following RISC pseudocode:

xor   r1,r1     ; clear register 1
store r1,0x1000 ; clear memory location 0x1000
add   r1,#5     ; load register 1 with the value 5

The resulting code size is bigger, but the reduced complexity of the instruction decode unit can result in faster overall operation. Dozens of such code optimizations exist to give RISC its simplicity.

RISC processors have a number of distinguishing characteristics. They have large register sets (in some architectures exceeding a thousand), thereby reducing the number of times the processor must access main memory. Often-used variables can be left inside the processor, reducing the number of accesses to (slow) external memory. Compilers of high-level languages (such as C) take advantage of this to optimize processor performance.

By having smaller and simpler instruction decode units, RISC processors have fast instruction execution, but this also reduces the size and power consumption of the processing unit. Generally, RISC instructions will take only one or two cycles to execute (this depends greatly on the particular processor). This is in contrast to instructions for a CISC processor, in which instructions may take many tens of cycles to execute. For example, one instruction (integer multiplication) on an 80486 CISC processor takes 42 cycles to complete. The same instruction on a RISC processor may take just one cycle. Instructions on a RISC processor have a simple format. All instructions are generally the same length (which makes instruction decode units simpler).

RISC processors implement what is known as a load/store architecture. This means that the only instructions that actually reference memory are load and store. In contrast, many (most) instructions on a CISC processor may access or manipulate memory. On a RISC processor, all other instructions (aside from load and store) work on the registers only. This facilitates the attribute of RISC processors that (most of) their instructions complete in a single cycle. As a consequence, RISC processors do not have the range of addressing modes that are found on CISC processors.

RISC processors also often have pipelined instruction execution. This means that while one instruction is being executed, the next instruction in the sequence is being decoded, while the third one is being fetched. At any given moment, several instructions will be in the pipeline and in the process of being executed. Again, this gives improved processor performance. Thus, even though not all instructions may take a single cycle to complete, the processor may issue and retire instructions on each cycle, thereby achieving effective single-cycle execution. Some RISC processors have overlapped instruction execution. load operations may allow the execution of subsequent, unrelated instructions to continue before the data requested by the load has been returned from memory. This allows these instructions to overlap the load, thereby improving processor performance.

Due to their computing power and low power consumption, RISC processors are becoming widely used, particularly in embedded computer systems, and many RISC attributes are appearing in what are traditionally CISC architectures (such as with the Intel Pentium). Ironically, many RISC architectures are adding some CISC-like features, and so the distinction between RISC and CISC is blurring.

An excellent discussion of RISC architectures and processor performance topics can be found in Kevin Dowd and Charles Severance’s High Performance Computing, available from O’Reilly & Associates.

So, which is better for embedded and industrial applications, RISC or CISC? If power consumption needs to be low, then RISC is probably the better architecture to use. However, if the available space for program storage is small, then a CISC processor may be a better alternative, since CISC instructions get more “bang” for the byte.

Digital Signal Processors

A special type of processor architecture is that of the Digital Signal Processor (DSP). These processors have instruction sets and architectures optimized for numerical processing of array data. They often extend the Harvard architecture concept further, not only by having separate data and code spaces, but also by splitting the data spaces into two or more banks. This allows concurrent instruction fetch and data accesses for multiple operands. As such, DSPs can have very high throughput and can outperform both CISC and RISC processors in certain applications.

DSPs have special hardware well suited to numerical processing of arrays. They often have hardware looping, whereby special registers allow for and control the repeated execution of an instruction sequence. This is also often known as zero-overhead looping, since no conditions need to be explicitly tested by the software as part of the looping process. DSPs often have dedicated hardware for increasing the speed of arithmetic operations. High-speed multipliers, multiply-and-accumulate (MAC) units, and barrel shifters are common features.

DSP processors are commonly used in embedded applications, and many conventional embedded microcontrollers include some DSP functionality.

Get Designing Embedded Hardware now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.