HPCUG Presentation

ACHIEVING HIGH AVAILABILITY

Paula S. McDaniel

Presenting Author: Paula S. McDaniel (paula_mcdaniel@hp.com)

Abstract

What is "availability" and how do you get it? International Data Corporation defines high availability as follows: "A computer is considered to be highly available if, when failure occurs, data is not lost, and the system can recover in a reasonable amount of time." This simplistic definition can be used as a starting point for discussion. The definition of "a reasonable amount of time" can be different for each company and for each application within the company.

From an end-user perspective, she/he needs to have reasonable, continuous access to systems that respond fast enough to accomplish the business function for which the systems were designed. Both planned and unplanned downtime detracts from the availability of the application. Most users would request continuous application availability, but availability requirements must be business decisions. The more highly available a system needs to be, the more resources it takes and the more it costs. This added cost is only justified if there is sufficient value to the business.

There is a mistaken belief in the computer industry that high availability is simply a hardware or software product that can be purchased, but there is no one single product that provides "high availability." In fact, achieving high availability requires more than investments in technology. Many things need to be considered when attempting to achieve high levels of availability for mission critical applications.

An availability infrastructure capable of supporting mission critical applications must be built on a strong foundation. Three pillars form the basis of that foundation. These pillars are a technology infrastructure (the hardware, software, networks, etc.), support partnerships, and IT processes. Each of these pillars is equally important in achieving your desired availability goals. Investments must be made in each area. Many companies today make large investments in technology, without regard to the other two pillars. This results in shifting the cause of downtime from hardware and software to human error. Good IT processes can go a long way to reducing the amount of human error, and strong support partnerships will help keep downtime to a minimum when a failure does occur.