Preface | p. vii |
Acknowledgements | p. xiii |
Fundamental Concepts in Fault Tolerance and Reliability Analysis | p. 1 |
Introduction | p. 1 |
Redundancy Techniques | p. 4 |
Hardware Redundancy | p. 5 |
Passive (Static) Hardware Redundancy | p. 5 |
Active (Dynamic) Hardware Redundancy | p. 6 |
Hybrid Hardware Redundancy | p. 7 |
Software Redundancy | p. 9 |
Static Software Redundancy Techniques | p. 9 |
Dynamic Software Redundancy Techniques | p. 10 |
Information Redundancy | p. 12 |
Error Detecting Codes | p. 14 |
Error Correcting Codes | p. 18 |
SEC-DED Codes | p. 20 |
CRC Codes | p. 26 |
Convolution Codes | p. 27 |
Time Redundancy | p. 29 |
Permanent Error Detection with Time Redundancy | p. 30 |
Reliability Modeling and Evaluation | p. 33 |
Empirical Models | p. 34 |
The Analytical Technique | p. 34 |
Summary | p. 42 |
References | p. 42 |
Fault Modeling, Simulation and Diagnosis | p. 44 |
Fault Modeling | p. 44 |
Fault Simulation | p. 51 |
Fault Simulation Algorithms | p. 52 |
Serial Fault Simulation Algorithm | p. 52 |
Parallel Fault Simulation | p. 53 |
Deductive Fault Simulation | p. 54 |
Concurrent Fault Simulation | p. 57 |
Critical Path Tracing | p. 57 |
Fault Diagnosis | p. 59 |
Combinational Fault Diagnosis | p. 59 |
Sequential Fault Diagnosis Methods | p. 61 |
Summary | p. 64 |
References | p. 64 |
Error Control and Self-Checking Circuits | p. 66 |
Error-Detecting/Error-Correcting Codes | p. 67 |
Self-Checking Circuits | p. 81 |
Summary | p. 92 |
References | p. 92 |
Fault Tolerance in Multiprocessor Systems | p. 94 |
Fault Tolerance in Interconnection Networks | p. 95 |
Reliability and Fault Tolerance in Single Loop Architectures | p. 104 |
Introduction to Fault Tolerance in Hypercube Networks | p. 108 |
Introduction to Fault Tolerance in Mesh Networks | p. 120 |
Summary | p. 125 |
References | p. 126 |
Fault-Tolerant Routing in Multi-Computer Networks | p. 127 |
Introduction | p. 127 |
Fault-Tolerant Routing Algorithms in Hypercube | p. 131 |
Depth-First Search Approach | p. 131 |
Iterative-Based Heuristic Routing Algorithm | p. 135 |
Routing in Faulty Mesh Networks | p. 140 |
Node Labeling Technique | p. 140 |
A FT Routing Scheme for Meshes with Non-Convex Faults | p. 141 |
Algorithm Extensions | p. 147 |
Multidimensional Meshes | p. 147 |
Faults with f-Chains | p. 148 |
Summary | p. 149 |
References | p. 149 |
Fault Tolerance and Reliability in Hierarchical Interconnection Networks | p. 152 |
Introduction | p. 152 |
Block-Shift Network (BSN) | p. 154 |
BSN Edges Groups | p. 155 |
BSN Construction | p. 156 |
BSN Degree and Diameter | p. 158 |
BSN Connectivity | p. 158 |
BSN Fault Diameter | p. 159 |
BSN Reliability | p. 160 |
Hierarchical Cubic Network (HCN) | p. 161 |
HCN Degree and Diameter | p. 162 |
HINs versus HCNs | p. 163 |
Topological Cost | p. 163 |
The Hyper-Torus Network (HTN) | p. 166 |
Summary | p. 170 |
References | p. 170 |
Fault Tolerance and Reliability of Computer Networks | p. 172 |
Background Material | p. 173 |
Fault Tolerance in Loop Networks | p. 174 |
Reliability of Token-Ring Networks | p. 175 |
Reliability of Bypass-Switch Networks | p. 176 |
Double Loop Architectures | p. 176 |
Multi-Drop Architectures | p. 178 |
Daisy-Chain Architectures | p. 178 |
Reliability of General Graph Networks | p. 180 |
The Exact Method | p. 180 |
Reliability Bounding | p. 185 |
Topology Optimization of Networks Subject to Reliability & Fault Tolerance Constraints | p. 188 |
Enumeration Techniques | p. 189 |
Network Reliability | p. 195 |
Iterative Techniques | p. 199 |
Maximizing Network Reliability by Adding a Single Edge | p. 204 |
Design for Networks Reliability | p. 204 |
Summary | p. 205 |
References | p. 206 |
Fault Tolerance in High Speed Switching Networks | p. 208 |
Introduction | p. 208 |
Classification of Fault-Tolerant Switching Architectures | p. 212 |
One-Fault Tolerance Switch Architectures | p. 213 |
Extra-Stage Shuffle Exchange | p. 213 |
Itoh Network | p. 214 |
The B-Tree Network | p. 215 |
Benes Network | p. 216 |
Parallel Banyan Network | p. 217 |
Tagle & Sharma Network | p. 218 |
Two-Fault Tolerance Switch Architectures | p. 219 |
Binary Tree Banyan Network | p. 219 |
Logarithmic-Fault Tolerance | p. 220 |
RAZAN | p. 220 |
Logical Neighborhood | p. 222 |
Improved Logical Neighborhood | p. 223 |
Architecture-Dependent Fault Tolerance | p. 224 |
Summary | p. 226 |
References | p. 226 |
Fault Tolerance in Distributed and Mobile Computing Systems | p. 229 |
Introduction | p. 229 |
Background Material | p. 231 |
Checkpointing Techniques in Mobile Networks | p. 236 |
Minimal Snapshot Collection Algorithm | p. 237 |
Mutable Checkpoints | p. 239 |
Adaptive Recovery | p. 241 |
Message Logging Based Checkpoints | p. 243 |
Hybrid Checkpoints | p. 244 |
Comparison | p. 245 |
Summary | p. 247 |
References | p. 247 |
Fault Tolerance in Mobile Networks | p. 249 |
Background Material | p. 249 |
More on Mutable Checkpoint Techniques in Mobile Networks | p. 251 |
Handling Mobility, Disconnection and Reconnection of MHs | p. 252 |
A Checkpointing Algorithm Based on Mutable Checkpoints | p. 253 |
Performance Evaluation | p. 261 |
Hardware Approach for Fault Tolerance in Mobile Networks | p. 265 |
Summary | p. 273 |
References | p. 273 |
Reliability and Yield Enhancement of VLSI/WSI Circuits | p. 276 |
Defect and Failure in VLSI Circuits | p. 276 |
Yield and Defect Model in VLSI/WSI Circuits | p. 279 |
Techniques to Improve Yield | p. 284 |
Effect of Redundancy on Yield | p. 286 |
Summary | p. 288 |
References | p. 288 |
Design of Fault-Tolerant Processor Arrays | p. 291 |
Introduction | p. 291 |
Hardware Redundancy Techniques | p. 294 |
Self-Reconfiguration Techniques | p. 317 |
Summary | p. 321 |
References | p. 322 |
Algorithm-Based Fault Tolerance | p. 326 |
Checksum-Based ABFT for Matrix Operations | p. 327 |
Checksum-Based ABFT Error Handling | p. 330 |
Weighted Checksum Based ABFT | p. 331 |
ABFT on a Mesh Multiprocessor | p. 332 |
Checksum-Based ABFT on a Hypercube Multiprocessor | p. 334 |
Partition-Based ABFT for Floating-Point Matrix Operations | p. 336 |
Summary | p. 339 |
References | p. 339 |
System Level Diagnosis-I | p. 341 |
Background Material and Basic Terminology | p. 342 |
System-Level Diagnosis Models | p. 347 |
Diagnosable Systems | p. 352 |
Diagnose-Ability Algorithms | p. 358 |
Centralized Diagnosis Systems | p. 359 |
Distributed Diagnosis Systems | p. 365 |
Summary | p. 372 |
References | p. 373 |
System Level Diagnosis-II | p. 378 |
Diagnosis Algorithms for Regular Structures | p. 378 |
Regular Structures | p. 379 |
Pessimistic One-Step Diagnosis Algorithms for Hypercube | p. 380 |
Diagnosis for Symmetric Multiple Processor Architecture | p. 383 |
Summary | p. 394 |
References | p. 394 |
Appendix | p. 397 |
Fault Tolerance and Reliability of the RAID Systems | p. 400 |
Introduction | p. 401 |
Redundancy Mechanisms | p. 403 |
Simple Reliability Analysis | p. 411 |
Advanced RAID Systems | p. 413 |
More on RAIDS | p. 418 |
Summary | p. 423 |
References | p. 423 |
High Availability in Computer Systems | p. 426 |
Introduction | p. 426 |
Tandem High Availability Computers at a Glance | p. 430 |
Availability in Client/Server Computing | p. 438 |
Chapter Summary | p. 440 |
References | p. 440 |
Table of Contents provided by Ingram. All Rights Reserved. |