Performance Analysis of Fault Tolerant Algorithms for the Heat Equation in Three Space Dimensions
H. Ltaiefa; M. Garbeya,*; E. Gabriela a Dept of Computer Science, University of Houston, Houston, TX 77204, USA
Publisher Summary
Based on distributed and uncoordinated check pointing, numerical methods presented in this chapter can reconstruct a consistent state in parallel application, despite storing checkpoints of various processes at different time steps. The main purpose of these algorithms is to avoid the expensive rollback operation to the last consistent distributed checkpoint, losing all the subsequent work and adding a significant overhead for applications running on thousands of processors because of coordinated checkpoints. The first ...
Get Parallel Computational Fluid Dynamics 2006 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.