6

–––––––––––––––––––––––

Diagnosability of Multiprocessor Systems

Chia-Wei Lee and Sun-Yuan Hsieh

6.1   INTRODUCTION

The rapid advances in very-large-scale integration (VLSI) technology and wafer-scale integration (WSI) technology have made it possible to design and produce a multiprocessor system containing hundreds or even thousands of processors (nodes) on a single chip. As the number of nodes in a multiprocessor system increases, node fault identification in such systems becomes more crucial for reliable computing. The process of discriminating between faulty nodes and fault-free nodes in a system is called fault diagnosis. When a faulty node is identified, it is replaced by a fault-free node to maintain the system’s reliability. The diagnosability of a system is the maximum number of faulty nodes that the system can identify.

Determining the diagnosability of multiprocessor systems based on various strategies and models has been the focus of a great deal of research in recent years (e.g., see References 125). Among the proposed models, two of which, namely, the PMC model (after Preparata, Metze, and Chien [19] and the MM model (after Maeng and Malek [18]), are well known and widely used. In the PMC model, every node is capable of testing whether another node v is faulty if there exists a communication link between them. The PMC model assumes that the tests of faulty nodes performed by fault-free ones always return one and that the tests performed by faulty nodes return ...

Get Scalable Computing and Communications: Theory and Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.