Some Thoughts on Debugging

Debugging is as much an art as a science. You can load a workbench to breaking point with all sorts of expensive test equipment, yet without a logical approach and a clear mind, elusive bugs will never be found. Conversely, by “right thinking,” the strangest of bugs can be isolated with a minimum of tools. While it is true that the more complex the system under test, the harder it is to nail down a fault through detection, it is also true that the most advanced and useful debugging tool you have at your disposal is your own brain. Therefore, learning to debug is learning to think carefully and clearly.

Debugging hardware can be a lot trickier than debugging software. With code, you can always put in some diagnostics to inspect the execution. That’s not to say that debugging software is trivial—far from it. But with hardware, it is often either a case of it all works, or nothing works. Software has the advantage of being able to be brought into operation gracefully. For hardware, you need to have an awful lot working right from the start.

The essence of debugging is establishing what works and what doesn’t work. As designs grow in complexity, finding hardware and design faults can become quite a complex problem.

For example, your embedded system may not be outputting characters through its serial port. Why? Perhaps it’s a bug in the code. Maybe there’s a cable fault. Maybe the RS-232C interface chip is dead. Maybe the serial chip itself is dead. There may be a timing problem with the serial chip’s oscillator or a voltage-level problem. Perhaps the processor itself is not coming out of reset and therefore not executing code at all. If so, maybe it’s the power-on reset circuit failing to kick in or the brownout detector kicking in when it shouldn’t. Maybe a data line between the processor and the serial chip is not connected, perhaps due to a manufacturing fault with the PCB. Or maybe it wasn’t soldered correctly. Perhaps your voltage regulator isn’t operating properly, or maybe you’ve a faulty power supply. And those are just the obvious causes that spring to mind. There are a thousand others lurking, with big teeth and a nasty disposition.

Any one problem may have a multitude of possible causes. Debugging is therefore about isolating a fault, and this is best done by a 20 questions approach. Use divide-and-conquer to solve the problem.

Let’s take the example of the faulty serial port problem. You discover the problem when you first try to test the serial port. Your simple test code fails to output a character. Is the problem in software or hardware? If hardware, is the problem with the cable, the serial chip(s), or a more fundamental problem with the core system? Check the cable and the terminal (or host PC) first. Disconnect the cable from the embedded computer, and with a piece of metal (a screwdriver blade will do), short out pins two and three (Rx and Tx) on the cable connector. Now type something on the terminal (or the terminal software on the PC). What comes out of the terminal should echo back through the short and appear on the screen. That will tell you whether there is a cable fault and whether the terminal is set up correctly.

If that works, then the problems lie in your embedded system. Replace your serial test code with code that does something else that is simpler (like waggle a digital I/O line or flash a LED). That simple action will tell you volumes. (Archimedes once said, “Give me a lever long enough and I will move the world.” Well, give me a status LED and enough time, and I’ll debug the world too!) It will tell you whether your processor is executing code correctly, which in turn shows that the processor and ROM (if a separate chip) have power and are communicating correctly. It shows that the reset circuit, brownout detector, oscillator, voltage regulator, address decoder, and other support logic are OK. If any of these are failing, then the processor will not be executing code and therefore that I/O line will not waggle or that LED will not flash. By that simple test, you have ruled out a plethora of possible faults.

If that test failed, you know to look elsewhere for the problem, such as checking the oscillator, reset, or voltage regulator for correct operation. Divide and conquer. If the test passed, then the fault lies with the serial chip. Most serial chips include some digital I/O that can be manually set (such as RTS). Write some test code that does this. This simple test will show whether you can talk to the chip. If the test passes, you know to look at either your character-output software or the RS-232 driver. If the test fails, then the problem lies in talking to the chip. Use an oscilloscope to check the chip select and other control signals going to the serial chip. Are they active? Are they reasonable? Write some software that continually “jams” a byte at a register in the serial chip. While meaningless to the serial chip, a continuous write of the same number allows you to observe the bus activity. So, your (pseudo) code to do this is:

        load    r1,#0x55         ; load %01010101
loop    store   serial_control   ; write it 
        jump    loop             ; continuously

You will expect to see the preceding bit pattern on the data bus (and importantly on the appropriate pins of the serial chip) at the same time the chip select and write enable are asserted.

This will enable you to locate a problem with the processor writing to the serial chip. Alternatively, if you can demonstrate that you can write to the chip correctly, then the problem lies either in the software or between the serial chip and the serial connector. By using the divide-and-conquer approach, you can isolate where a problem lies. Devise tests to prove each aspect of system operation.

Often you will be faced with a bug that makes no sense. Something should be working, and it is not. Everything you check seems right, but the total system just isn’t working. It can be very perplexing. You have made a common error—you have made an assumption. Somewhere, even though you may not be consciously aware of it, you have assumed that some little detail is correct, when in fact it is not. This is the hardest obstacle to overcome. When you say to yourself, “It should be working, but it isn’t! It doesn’t make sense!” then say to yourself, “There is still something I haven’t checked.” Go looking for it. If you can’t find it, then you haven’t looked hard enough.

When designing your system and laying out the PCB, remember that you will have to debug it. So, design it with debugging in mind. Include one or more status LEDs. These are invaluable for debugging embedded hardware. Sure, you can do a lot with a remote debugger (such as gnu’s gdb), but you have to get the hardware working to a certain level before the debugger can be made to run. Status LEDs will help you get there.

You are also going to need to look at signals with an oscilloscope, so include a ground pin on your circuit board onto which you can clip. Also, make sure that you will be able to get an oscilloscope probe to every circuit trace on the board to examine what’s going on. If you can’t get to a track, you can’t ensure that there’s no problem with that particular signal.

So even at the design stage, think carefully about how you can test the subsystems and isolate problems and put the necessary support into your design.

In the next part of the book, we’ll look at some embedded processors and how you design systems based upon them. We start, in the next chapter, with the Microchip PIC processor family.

Get Designing Embedded Hardware now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.