Book description
Architecture Design for Soft Errors provides a comprehensive description of the architectural techniques to tackle the soft error problem. It covers the new methodologies for quantitative analysis of soft errors as well as novel, cost-effective architectural techniques to mitigate them.
To provide readers with a better grasp of the broader problem definition and solution space, this book also delves into the physics of soft errors and reviews current circuit and software mitigation techniques. There are a number of different ways this book can be read or used in a course: as a complete course on architecture design for soft errors covering the entire book; a short course on architecture design for soft errors; and as a reference book on classical fault-tolerant machines.
This book is recommended for practitioners in semi-conductor industry, researchers and developers in computer architecture, advanced graduate seminar courses on soft errors, and (iv) as a reference book for undergraduate courses in computer architecture.
- Helps readers build-in fault tolerance to the billions of microchips produced each year, all of which are subject to soft errors
- Shows readers how to quantify their soft error reliability
- Provides state-of-the-art techniques to protect against soft errors
Table of contents
- Cover
- Title page
- Table of Contents
- Copyright
- Dedication
- Foreword
- Preface
-
Chapter 1: Introduction
- 1.1 Overview
- 1.2 Faults
- 1.3 Errors
- 1.4 Metrics
- 1.5 Dependability Models
- 1.6 Permanent Faults in Complementary Metal Oxide Semiconductor Technology
- 1.7 Radiation-Induced Transient Faults in CMOS Transistors
- 1.8 Architectural Fault Models for Alpha Particle and Neutron Strikes
- 1.9 Silent Data Corruption and Detected Unrecoverable Error
- 1.10 Soft Error Scaling Trends
- 1.11 Summary
- 1.12 Historical Anecdote
- Chapter 2: Device- and Circuit-Level Modeling, Measurement, and Mitigation
-
Chapter 3: Architectural Vulnerability Analysis
- 3.1 Overview
- 3.2 AVF Basics
- 3.3 Does a Bit Matter?
- 3.4 SDC and DUE Equations
- 3.5 ACE Principles
- 3.6 Microarchitectural Un-ACE Bits
- 3.7 Architectural Un-ACE Bits
- 3.8 AVF Equations for a Hardware Structure
- 3.9 Computing AVF with Little’s Law
- 3.10 Computing AVF with a Performance Model
- 3.11 ACE Analysis Using the Point-of-Strike Fault Model
- 3.12 ACE Analysis Using the Propagated Fault Model
- 3.13 Summary
- 3.14 Historical Anecdote
-
Chapter 4: Advanced Architectural Vulnerability Analysis
- 4.1 Overview
- 4.2 Lifetime Analysis of RAM Arrays
- 4.3 Lifetime Analysis of CAM Arrays
- 4.4 Effect of Cooldown in Lifetime Analysis
- 4.5 AVF Results for Cache, Data Translation Buffer, and Store Buffer
- 4.6 Computing AVFs Using SFI into an RTL Model
- 4.7 Case Study of SFI
- 4.8 Summary
- 4.9 Historical Anecdote
-
Chapter 5: Error Coding Techniques
- 5.1 Overview
- 5.2 Fault Detection and ECC for State Bits
- 5.3 Error Detection Codes for Execution Units
- 5.4 Implementation Overhead of Error Detection and Correction Codes
- 5.5 Scrubbing Analysis
- 5.6 Detecting False Errors
- 5.7 Hardware Assertions
- 5.8 Machine Check Architecture
- 5.9 Summary
- 5.10 Historical Anecdote
-
Chapter 6: Fault Detection via Redundant Execution
- 6.1 Overview
- 6.2 Sphere of Replication
- 6.3 Fault Detection via Cycle-by-Cycle Lockstepping
- 6.4 Lockstepping in the Hewlett-Packard NonStop Himalaya Architecture
- 6.5 Lockstepping in the IBM Z-series Processors
- 6.6 Fault Detection via RMT
- 6.7 RMT in the Marathon Endurance Server
- 6.8 RMT in the Hewlett-Packard NonStop® Advanced Architecture
- 6.9 RMT Within a Single-Processor Core
- 6.10 RMT in a Multicore Architecture
- 6.11 DIVA: RMT Using Specialized Checker Processor
- 6.12 RMT Enhancements
- 6.13 Summary
- 6.14 Historical Anecdote
-
Chapter 7: Hardware Error Recovery
- 7.1 Overview
- 7.2 Classification of Hardware Error Recovery Schemes
- 7.3 Forward Error Recovery
- 7.4 Backward Error Recovery with Fault Detection before Register Commit
- 7.5 Backward Error Recovery with Fault Detection before Memory Commit
- 7.6 Backward Error Recovery with Fault Detection before I/O Commit
- 7.7 Backward Error Recovery with Fault Detection after I/O Commit
- 7.8 Summary
- 7.9 Historical Anecdote
- Chapter 8: Software Detection and Recovery
- Index
Product information
- Title: Architecture Design for Soft Errors
- Author(s):
- Release date: August 2011
- Publisher(s): Morgan Kaufmann
- ISBN: 9780080558325
You might also like
book
Communication Architectures for Systems-on-Chip
A presentation of state-of-the-art approaches from an industrial applications perspective, Communication Architectures for Systems-on-Chip shows professionals, …
book
Soft Errors
Soft errors are a multifaceted issue at the crossroads of applied physics and engineering sciences. Soft …
book
Reconfigurable Logic
This book explores classical field programmable gate array (FPGA) architectures and their supporting tools; evaluates recent …
book
Dynamic RAM
Dynamic RAM (DRAM) has wide applications in the computer industry, telecommunications, the military, and the space …