Skip to Content
High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI
book

High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI

by Joseph D Sloan
November 2004
Intermediate to advanced
368 pages
10h 24m
English
O'Reilly Media, Inc.
Content preview from High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI

Chapter 16. Debugging Parallel Programs

If you are using a cluster, you are probably dealing with large, relatively complicated problems. As problem complexity grows, the likelihood of errors grows as well. In these circumstances, debugging becomes an increasingly important skill. It is a simple fact of life—if you write code, you are going to have to debug it.

In this chapter, we’ll begin by looking at why debugging parallel programs can be challenging. Next, we’ll review debugging in general. Finally, we’ll look at how the traditional serial debugging approaches can be extended to parallel problems. Parallel debugging is an active research area, so there is a lot to learn. We’ll stick to the basics here.

Debugging and Parallel Programs

Parallel code presents new difficulties, and the task of coordinating processes can result in some novel errors not seen in serial code. While elaborate classification schemes for parallel problems exist, there are two broad categories of errors in parallel code that you are likely to come up against. These are synchronization problems that stem from inherent nondeterminism found in parallel code and deadlock. While we can further subclassify problems, you shouldn’t be too concerned about finer distinctions. If you can determine the source of error and how to correct it, you can leave the classification to the more academically inclined.

Synchronization problems result from variations in the order that instructions may be executed when spread among ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Building a Linux HPC Cluster with xCAT

Building a Linux HPC Cluster with xCAT

Luis Ferreira, Christopher Turcksin, Brad Elkin, Scott Denham, Benjamin Khoo, Matt Bohnsack, Egan Ford
Embedded Computing for High Performance

Embedded Computing for High Performance

João Paiva Cardoso, José Figueiredo Coutinho, Pedro C. Diniz

Publisher Resources

ISBN: 0596005709Errata Page