| Preface | p. vii |
| Acknowledgements | p. xiii |
| Fundamental Concepts in Fault Tolerance and Reliability Analysis | p. 1 |
| Introduction | p. 1 |
| Redundancy Techniques | p. 4 |
| Hardware Redundancy | p. 5 |
| Passive (Static) Hardware Redundancy | p. 5 |
| Active (Dynamic) Hardware Redundancy | p. 6 |
| Hybrid Hardware Redundancy | p. 7 |
| Software Redundancy | p. 9 |
| Static Software Redundancy Techniques | p. 9 |
| Dynamic Software Redundancy Techniques | p. 10 |
| Information Redundancy | p. 12 |
| Error Detecting Codes | p. 14 |
| Error Correcting Codes | p. 18 |
| SEC-DED Codes | p. 20 |
| CRC Codes | p. 26 |
| Convolution Codes | p. 27 |
| Time Redundancy | p. 29 |
| Permanent Error Detection with Time Redundancy | p. 30 |
| Reliability Modeling and Evaluation | p. 33 |
| Empirical Models | p. 34 |
| The Analytical Technique | p. 34 |
| Summary | p. 42 |
| References | p. 42 |
| Fault Modeling, Simulation and Diagnosis | p. 44 |
| Fault Modeling | p. 44 |
| Fault Simulation | p. 51 |
| Fault Simulation Algorithms | p. 52 |
| Serial Fault Simulation Algorithm | p. 52 |
| Parallel Fault Simulation | p. 53 |
| Deductive Fault Simulation | p. 54 |
| Concurrent Fault Simulation | p. 57 |
| Critical Path Tracing | p. 57 |
| Fault Diagnosis | p. 59 |
| Combinational Fault Diagnosis | p. 59 |
| Sequential Fault Diagnosis Methods | p. 61 |
| Summary | p. 64 |
| References | p. 64 |
| Error Control and Self-Checking Circuits | p. 66 |
| Error-Detecting/Error-Correcting Codes | p. 67 |
| Self-Checking Circuits | p. 81 |
| Summary | p. 92 |
| References | p. 92 |
| Fault Tolerance in Multiprocessor Systems | p. 94 |
| Fault Tolerance in Interconnection Networks | p. 95 |
| Reliability and Fault Tolerance in Single Loop Architectures | p. 104 |
| Introduction to Fault Tolerance in Hypercube Networks | p. 108 |
| Introduction to Fault Tolerance in Mesh Networks | p. 120 |
| Summary | p. 125 |
| References | p. 126 |
| Fault-Tolerant Routing in Multi-Computer Networks | p. 127 |
| Introduction | p. 127 |
| Fault-Tolerant Routing Algorithms in Hypercube | p. 131 |
| Depth-First Search Approach | p. 131 |
| Iterative-Based Heuristic Routing Algorithm | p. 135 |
| Routing in Faulty Mesh Networks | p. 140 |
| Node Labeling Technique | p. 140 |
| A FT Routing Scheme for Meshes with Non-Convex Faults | p. 141 |
| Algorithm Extensions | p. 147 |
| Multidimensional Meshes | p. 147 |
| Faults with f-Chains | p. 148 |
| Summary | p. 149 |
| References | p. 149 |
| Fault Tolerance and Reliability in Hierarchical Interconnection Networks | p. 152 |
| Introduction | p. 152 |
| Block-Shift Network (BSN) | p. 154 |
| BSN Edges Groups | p. 155 |
| BSN Construction | p. 156 |
| BSN Degree and Diameter | p. 158 |
| BSN Connectivity | p. 158 |
| BSN Fault Diameter | p. 159 |
| BSN Reliability | p. 160 |
| Hierarchical Cubic Network (HCN) | p. 161 |
| HCN Degree and Diameter | p. 162 |
| HINs versus HCNs | p. 163 |
| Topological Cost | p. 163 |
| The Hyper-Torus Network (HTN) | p. 166 |
| Summary | p. 170 |
| References | p. 170 |
| Fault Tolerance and Reliability of Computer Networks | p. 172 |
| Background Material | p. 173 |
| Fault Tolerance in Loop Networks | p. 174 |
| Reliability of Token-Ring Networks | p. 175 |
| Reliability of Bypass-Switch Networks | p. 176 |
| Double Loop Architectures | p. 176 |
| Multi-Drop Architectures | p. 178 |
| Daisy-Chain Architectures | p. 178 |
| Reliability of General Graph Networks | p. 180 |
| The Exact Method | p. 180 |
| Reliability Bounding | p. 185 |
| Topology Optimization of Networks Subject to Reliability & Fault Tolerance Constraints | p. 188 |
| Enumeration Techniques | p. 189 |
| Network Reliability | p. 195 |
| Iterative Techniques | p. 199 |
| Maximizing Network Reliability by Adding a Single Edge | p. 204 |
| Design for Networks Reliability | p. 204 |
| Summary | p. 205 |
| References | p. 206 |
| Fault Tolerance in High Speed Switching Networks | p. 208 |
| Introduction | p. 208 |
| Classification of Fault-Tolerant Switching Architectures | p. 212 |
| One-Fault Tolerance Switch Architectures | p. 213 |
| Extra-Stage Shuffle Exchange | p. 213 |
| Itoh Network | p. 214 |
| The B-Tree Network | p. 215 |
| Benes Network | p. 216 |
| Parallel Banyan Network | p. 217 |
| Tagle & Sharma Network | p. 218 |
| Two-Fault Tolerance Switch Architectures | p. 219 |
| Binary Tree Banyan Network | p. 219 |
| Logarithmic-Fault Tolerance | p. 220 |
| RAZAN | p. 220 |
| Logical Neighborhood | p. 222 |
| Improved Logical Neighborhood | p. 223 |
| Architecture-Dependent Fault Tolerance | p. 224 |
| Summary | p. 226 |
| References | p. 226 |
| Fault Tolerance in Distributed and Mobile Computing Systems | p. 229 |
| Introduction | p. 229 |
| Background Material | p. 231 |
| Checkpointing Techniques in Mobile Networks | p. 236 |
| Minimal Snapshot Collection Algorithm | p. 237 |
| Mutable Checkpoints | p. 239 |
| Adaptive Recovery | p. 241 |
| Message Logging Based Checkpoints | p. 243 |
| Hybrid Checkpoints | p. 244 |
| Comparison | p. 245 |
| Summary | p. 247 |
| References | p. 247 |
| Fault Tolerance in Mobile Networks | p. 249 |
| Background Material | p. 249 |
| More on Mutable Checkpoint Techniques in Mobile Networks | p. 251 |
| Handling Mobility, Disconnection and Reconnection of MHs | p. 252 |
| A Checkpointing Algorithm Based on Mutable Checkpoints | p. 253 |
| Performance Evaluation | p. 261 |
| Hardware Approach for Fault Tolerance in Mobile Networks | p. 265 |
| Summary | p. 273 |
| References | p. 273 |
| Reliability and Yield Enhancement of VLSI/WSI Circuits | p. 276 |
| Defect and Failure in VLSI Circuits | p. 276 |
| Yield and Defect Model in VLSI/WSI Circuits | p. 279 |
| Techniques to Improve Yield | p. 284 |
| Effect of Redundancy on Yield | p. 286 |
| Summary | p. 288 |
| References | p. 288 |
| Design of Fault-Tolerant Processor Arrays | p. 291 |
| Introduction | p. 291 |
| Hardware Redundancy Techniques | p. 294 |
| Self-Reconfiguration Techniques | p. 317 |
| Summary | p. 321 |
| References | p. 322 |
| Algorithm-Based Fault Tolerance | p. 326 |
| Checksum-Based ABFT for Matrix Operations | p. 327 |
| Checksum-Based ABFT Error Handling | p. 330 |
| Weighted Checksum Based ABFT | p. 331 |
| ABFT on a Mesh Multiprocessor | p. 332 |
| Checksum-Based ABFT on a Hypercube Multiprocessor | p. 334 |
| Partition-Based ABFT for Floating-Point Matrix Operations | p. 336 |
| Summary | p. 339 |
| References | p. 339 |
| System Level Diagnosis-I | p. 341 |
| Background Material and Basic Terminology | p. 342 |
| System-Level Diagnosis Models | p. 347 |
| Diagnosable Systems | p. 352 |
| Diagnose-Ability Algorithms | p. 358 |
| Centralized Diagnosis Systems | p. 359 |
| Distributed Diagnosis Systems | p. 365 |
| Summary | p. 372 |
| References | p. 373 |
| System Level Diagnosis-II | p. 378 |
| Diagnosis Algorithms for Regular Structures | p. 378 |
| Regular Structures | p. 379 |
| Pessimistic One-Step Diagnosis Algorithms for Hypercube | p. 380 |
| Diagnosis for Symmetric Multiple Processor Architecture | p. 383 |
| Summary | p. 394 |
| References | p. 394 |
| Appendix | p. 397 |
| Fault Tolerance and Reliability of the RAID Systems | p. 400 |
| Introduction | p. 401 |
| Redundancy Mechanisms | p. 403 |
| Simple Reliability Analysis | p. 411 |
| Advanced RAID Systems | p. 413 |
| More on RAIDS | p. 418 |
| Summary | p. 423 |
| References | p. 423 |
| High Availability in Computer Systems | p. 426 |
| Introduction | p. 426 |
| Tandem High Availability Computers at a Glance | p. 430 |
| Availability in Client/Server Computing | p. 438 |
| Chapter Summary | p. 440 |
| References | p. 440 |
| Table of Contents provided by Ingram. All Rights Reserved. |