Fault-Tolerance Techniques for High-Performance Computing : Computer Communications and Networks - Author

eTEXT

Fault-Tolerance Techniques for High-Performance Computing

By: Author

eText | 1 July 2015

At a Glance

eText


$149.01

or 4 interest-free payments of $37.25 with

 or 

Instant online reading in your Booktopia eTextbook Library *

Read online on
Desktop
Tablet
Mobile

Not downloadable to your eReader or an app

Why choose an eTextbook?

Instant Access *

Purchase and read your book immediately

Read Aloud

Listen and follow along as Bookshelf reads to you

Study Tools

Built-in study tools like highlights and more

* eTextbooks are not downloadable to your eReader or an app and can be accessed via web browsers only. You must be connected to the internet and have no technical issues with your device or browser that could prevent the eTextbook from operating.

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Read online on
Desktop
Tablet
Mobile

More in Computer Hardware