Parallel Computing Concepts and Terminology | |
Introduction | p. 3 |
Parallel Computing in Quantum Chemistry: Past and Present | p. 4 |
Trends in Hardware Development | p. 5 |
Moore's Law | p. 5 |
Clock Speed and Performance | p. 6 |
Bandwidth and Latency | p. 7 |
Supercomputer Performance | p. 8 |
Trends in Parallel Software Development | p. 10 |
Responding to Changes in Hardware | p. 10 |
New Algorithms and Methods | p. 10 |
New Programming Models | p. 12 |
References | p. 13 |
Parallel Computer Architectures | p. 17 |
Flynn's Classification Scheme | p. 17 |
Single-Instruction, Single-Data | p. 17 |
Single-Instruction, Multiple-Data | p. 18 |
Multiple-Instruction, Multiple-Data | p. 18 |
Network Architecture | p. 19 |
Direct and Indirect Networks | p. 19 |
Routing | p. 20 |
Network Performance | p. 23 |
Network Topology | p. 25 |
Crossbar | p. 26 |
Ring | p. 27 |
Mesh and Torus | p. 27 |
Hypercube | p. 28 |
Fat Tree | p. 28 |
Bus | p. 30 |
Ad Hoc Grid | p. 31 |
Node Architecture | p. 31 |
MIMD System Architecture | p. 34 |
Memory Hierarchy | p. 35 |
Persistent Storage | p. 35 |
Local Storage | p. 37 |
Network Storage | p. 37 |
Trends in Storage | p. 38 |
Reliability | p. 38 |
Homogeneity and Heterogeneity | p. 39 |
Commodity versus Custom Computers | p. 40 |
Further Reading | p. 42 |
References | p. 43 |
Communication via Message-Passing | p. 45 |
Point-to-Point Communication Operations | p. 46 |
Blocking Point-to-Point Operations | p. 46 |
Non-Blocking Point-to-Point Operations | p. 47 |
Collective Communication Operations | p. 49 |
One-to-All Broadcast | p. 50 |
All-to-All Broadcast | p. 51 |
All-to-One Reduction and All-Reduce | p. 54 |
One-Sided Communication Operations | p. 55 |
Further Reading | p. 56 |
References | p. 56 |
Multi-Threading | p. 59 |
Pitfalls of Multi-Threading | p. 61 |
Thread-Safety | p. 64 |
Comparison of Multi-Threading and Message-Passing | p. 65 |
Hybrid Programming | p. 66 |
Further Reading | p. 69 |
References | p. 70 |
Parallel Performance Evaluation | p. 71 |
Network Performance Characteristics | p. 71 |
Performance Measures for Parallel Programs | p. 74 |
Speedup and Efficiency | p. 74 |
Scalability | p. 79 |
Performance Modeling | p. 80 |
Modeling the Execution Time | p. 80 |
Performance Model Example: Matrix-Vector Multiplication | p. 83 |
Presenting and Evaluating Performance Data: A Few Caveats | p. 86 |
Further Reading | p. 90 |
References | p. 90 |
Parallel Program Design | p. 93 |
Distribution of Work | p. 94 |
Static Task Distribution | p. 95 |
Round-Robin and Recursive Task Distributions | p. 96 |
Dynamic Task Distribution | p. 99 |
Manager-Worker Model | p. 99 |
Decentralized Task Distribution | p. 101 |
Distribution of Data | p. 101 |
Designing a Communication Scheme | p. 104 |
Using Collective Communication | p. 104 |
Using Point-to-Point Communication | p. 105 |
Design Example: Matrix-Vector Multiplication | p. 107 |
Using a Row-Distributed Matrix | p. 108 |
Using a Block-Distributed Matrix | p. 109 |
Summary of Key Points of Parallel Program Design | p. 112 |
Further Reading | p. 114 |
References | p. 114 |
Applications of Parallel Programming in Quantum Chemistry | |
Two-Electron Integral Evaluation | p. 117 |
Basics of Integral Computation | p. 117 |
Parallel Implementation Using Static Load Balancing | p. 119 |
Parallel Algorithms Distributing Shell Quartets and Pairs | p. 119 |
Performance Analysis | p. 121 |
Determination of the Load Imbalance Factor k(p) | p. 122 |
Determination of [mu] and [sigma] for Integral Computation | p. 123 |
Predicted and Measured Efficiencies | p. 124 |
Parallel Implementation Using Dynamic Load Balancing | p. 125 |
Parallel Algorithm Distributing Shell Pairs | p. 126 |
Performance Analysis | p. 128 |
Load Imbalance | p. 128 |
Communication Time | p. 128 |
Predicted and Measured Efficiencies | p. 129 |
References | p. 130 |
The Hartree-Fock Method | p. 131 |
The Hartree-Fock Equations | p. 131 |
The Hartree-Fock Procedure | p. 133 |
Parallel Fock Matrix Formation with Replicated Data | p. 135 |
Parallel Fock Matrix Formation with Distributed Data | p. 138 |
Further Reading | p. 145 |
References | p. 146 |
Second-Order Moller-Plesset Perturbation Theory | p. 147 |
The Canonical MP2 Equations | p. 147 |
A Scalar Direct MP2 Algorithm | p. 149 |
Parallelization with Minimal Modifications | p. 151 |
High-Performance Parallelization | p. 154 |
Performance of the Parallel Algorithms | p. 158 |
Further Reading | p. 164 |
References | p. 164 |
Local Moller-Plesset Perturbation Theory | p. 167 |
The LMP2 Equations | p. 167 |
A Scalar LMP2 Algorithm | p. 169 |
Parallel LMP2 | p. 170 |
Two-Electron Integral Transformation | p. 171 |
Computation of the Residual | p. 173 |
Parallel Performance | p. 174 |
References | p. 177 |
Appendices | |
A Brief Introduction to MPI | p. 181 |
Pthreads: Explicit Use of Threads | p. 189 |
OpenMP: Compiler Extensions for Multi-Threading | p. 195 |
Index | p. 205 |
Table of Contents provided by Ingram. All Rights Reserved. |