| Introduction | p. 1 |
| Audience | p. 2 |
| Roadmap of This Book | p. 4 |
| Real-World Examples | p. 8 |
| Elementary Concepts | p. 13 |
| Business Issues | p. 14 |
| Business Continuity as the Overall Goal | p. 16 |
| Regulatory Compliance and Risk Management | p. 16 |
| System and Outage Categorization | p. 17 |
| High Availability - Handling Minor Outages | p. 22 |
| Availability | p. 24 |
| Reliability | p. 25 |
| Serviceability | p. 25 |
| Disaster Recovery - Handling Major Outages | p. 26 |
| Quantifying Availability: 99.9... % and Reality | p. 29 |
| Service Level Agreements | p. 31 |
| Basic Approach: Robustness and Redundancy | p. 34 |
| Layered Solution with Multiple Precautions | p. 38 |
| Summary | p. 39 |
| Architecture | p. 41 |
| Objectives | p. 45 |
| Conceptual Model | p. 48 |
| System Model | p. 51 |
| System Design | p. 55 |
| Base Concepts | p. 55 |
| System Stack | p. 56 |
| Redundancy and Replication | p. 61 |
| Robustness and Simplicity | p. 74 |
| Virtualization | p. 77 |
| Solution Roadmap | p. 78 |
| List Failure Scenarios | p. 79 |
| Evaluate Failure Scenarios | p. 82 |
| Map Scenarios to Requirements | p. 82 |
| Design Solution | p. 85 |
| Review Selected Solution Against Scenarios | p. 86 |
| System Solution Patterns | p. 86 |
| System Implementation Process | p. 87 |
| Systems for All Process Steps | p. 87 |
| Use Case: SAP Server | p. 89 |
| Hardware | p. 99 |
| Components and Computer Systems | p. 104 |
| Disk Storage | p. 108 |
| Raid - Redundant Array of Independent Disks | p. 109 |
| Storage Systems | p. 119 |
| SAN vs. NAS | p. 124 |
| Journaling Is Essential for High Availability | p. 125 |
| Virtualization of Resources | p. 126 |
| Vendor Selection and Purchasing Decisions | p. 128 |
| System Installation | p. 132 |
| System Maintenance and Operations | p. 139 |
| Making Our Own Statistics | p. 142 |
| Operating Systems | p. 149 |
| Failover Clusters | p. 151 |
| How Does It Work? | p. 157 |
| Failover Cluster Implementation Experiences | p. 166 |
| Load-Balancing Clusters | p. 176 |
| Load-Balancing Approaches | p. 178 |
| Target Selection for Load Balancing | p. 181 |
| Cluster and Server Consolidation | p. 183 |
| Virtualization and Moore's Law | p. 183 |
| Host Virtualization | p. 184 |
| Databases and Middleware | p. 189 |
| Middleware Categories | p. 191 |
| Database Servers | p. 193 |
| High-Availability Options for Database Servers | p. 199 |
| Disaster Recovery for Databases | p. 204 |
| Web Servers | p. 205 |
| Application Servers | p. 208 |
| Messaging Servers | p. 213 |
| Applications | p. 215 |
| Integration in a Cluster on the Operating System Level | p. 217 |
| High Availability Through Middleware | p. 223 |
| High Availability From Scratch | p. 225 |
| Code Quality Is Important | p. 227 |
| Testing for High Availability | p. 229 |
| Infrastructure | p. 233 |
| Network | p. 234 |
| Network Devices | p. 238 |
| LAN Segments | p. 240 |
| Default Gateway | p. 248 |
| Routing in LANs and WANs | p. 252 |
| Firewalls and Network Address Translation | p. 258 |
| Network Design for Disaster Recovery | p. 264 |
| Infrastructure Services | p. 267 |
| Dynamic Host Configuration Protocol (DHCP) | p. 267 |
| Domain Name Service (DNS) | p. 271 |
| Directory Server | p. 276 |
| Backup and Restore | p. 283 |
| Monitoring | p. 284 |
| Disaster Recovery | p. 287 |
| Concepts | p. 289 |
| Approach | p. 291 |
| Conceptual Design | p. 292 |
| Scenarios for Major Outages | p. 293 |
| Disaster-Recovery Scope | p. 295 |
| Primary and Disaster-Recovery Sites | p. 297 |
| State Synchronization | p. 298 |
| Shared System, Hot or Cold Standby | p. 300 |
| Time to Recovery - Failback to the Primary Site | p. 303 |
| Solutions | p. 305 |
| Metro Cluster | p. 306 |
| Fast Restore | p. 309 |
| Application-Level or Middleware-Level Clustering | p. 309 |
| Application Data Mirroring | p. 310 |
| Disk Mirroring | p. 317 |
| Matching Configuration Changes | p. 317 |
| Disaster-Recovery Tests | p. 318 |
| Test Goals and Categories | p. 319 |
| Organizational Test Context | p. 321 |
| Quality Characteristics | p. 322 |
| Holistic View - What Is Needed Besides Technology? | p. 322 |
| Command Center and War Room | p. 323 |
| Disaster-Recovery Emergency Pack | p. 323 |
| A Prototypical Disaster-Recovery Project | p. 324 |
| System Identification - the Primary Site | p. 326 |
| Business Requirements and Project Goals | p. 331 |
| Business View | p. 333 |
| System Design | p. 336 |
| Implementation | p. 345 |
| Failover to Disaster-Recovery Site or Disaster-Recovery Systems | p. 351 |
| General Approach | p. 351 |
| Example Checklist for a Database Disaster-Recovery Server | p. 355 |
| Failback to the Primary System | p. 357 |
| Reliability Calculations and Statistics | p. 359 |
| Mathematical Basics | p. 360 |
| Mean Time Between Failures and Annual Failure Rate | p. 362 |
| Redundancy and Probability of Failures | p. 363 |
| Raid Configurations | p. 365 |
| Example Calculations | p. 372 |
| Reliability over Time - the Bathtub Curve | p. 374 |
| Data Centers | p. 377 |
| Room Installation | p. 378 |
| Heat and Fire Control | p. 381 |
| Power Control | p. 384 |
| Computer Setup | p. 386 |
| Service Support Processes | p. 387 |
| Incident Management | p. 388 |
| Problem Management | p. 389 |
| Configuration Management | p. 391 |
| Change Management | p. 394 |
| Release Management | p. 395 |
| Information Gathering and Reporting | p. 397 |
| References | p. 399 |
| Index | p. 401 |
| Table of Contents provided by Ingram. All Rights Reserved. |