High Availability And Failover

Buzzwords 2.0: High Availability and Failover

Downtime. The word strikes dread into the core of an IT professional. Downtime creates an entrance into a labyrinth of longer maintenance hours, impatient managers and the possibility of unwanted overtime.

Growing businesses are demanding reliable infrastructure that reduces downtime and points of failure, making the terms “high availability” and “failover” are more popular than ever. Aside from tech buzzwords, what are the tangible implications for managers or business owners?

Availability Environments

In the IT industry, the term availability generally refers to the period of time one can expect a service to be available. What gives infrastructure a “highly available” system characteristic? The Harvard Research Group breaks down availability environments into five classifications: AE0-AE4.

AE0 is a conventional system, with some shutdowns, or potential for lost data. An AE4 environment provides 24/7 operation, with no data lost and any failures are transparent to the user. Though there is no industry specific rating for high availability, the Harvard ratings help create a general framework for the availability characteristic. A high availability falls roughly around the AE2 and AE3 ratings.

Availability is measured in percentages. A 99% availability means the chance of 1% downtime. In a 365-day year, this means 3.65 days of downtime. Most systems offer around 99.99% or 99.999% availability. Though there is technically no 100% availability, many high availability systems have minimal downtime compared to archaic or unreliable systems.

The Business Take Away: The better availability environment, the less downtime, expense and overall functionality you infrastructure will have.

The Components of High Availability

High availability is made possible by fault tolerance and redundancy. Fault tolerance is defined as a component that, in the event of a component failure, triggers another backup procedure or component to take over. This means little-to-no of loss service. Fault tolerance is typically established with some mix of software and hardware. Essentially, fault tolerance ensures the service is available to users at all times.

When deploying a high availability system, part of creating a high fault tolerance is identifying and expelling single points of failure (SPOF). SPOF is a system component that, when failed, disrupts the entire system rendering it unreliable or unavailable. Mitigating SPOFs involves ensuring an entire directory doesn’t fail by using redundancy.

From the Department of Redundancy Department

Using redundant server components and replication is a common approach to mitigating SPOFs. Redundancy essentially serves as a backup or fail-safe, while contributing to the overall availability of a system. Redundancy will either be passive or active.

Active redundancy involves two systems processing the exact same thing. Only one output is used so there is no redundancy in the end point. Both components are equally powerful, thus if a device fails, there is no change in capabilities from one device to another.

Passive redundancy allows one device to process the action, and the other to remain idle. The second device is ready to pick up upon failure. This does require a switching component that can change the input and output channel while processing a failure.

Redundancy mitigates SPOFs by providing a fail-safe if a single hardware component fails. Without redundancy, a single hardware failure can be fatal to a machine.

Implementing High Availability

What does high availability look like in action? For the purposes of this article, I’ll focus on high availability with respect to physical servers, as opposed to a virtual environment. On-premises high availability is used primarily for business continuity. Large amounts of traffic contribute to lagging networks, and can result in downtime.

To mitigate this, firms will employ a redundancy principle and deploy two servers, typically, in a singular location. At SmartFile, our FileHub solution is capable of operating in the same manner. The two servers run in active-active redundancy. Because both devices are working in coordination, the traffic and processing burden is lifted from a single device. This improves speed and effectiveness of the network, translating to minimal or no downtime.

High availability must be used when serving a critical business process. For example, a construction firm that leverages a file server for housing all of their project files. Internally, this is the lifeblood of their organization. Externally, subcontractors and remote employees need access to files on a time-sensitive basis. Downtime equals delays and direct impact on revenue.

A software company that uses SmartFile to collect log files or push software updates to their customers relies on accessibility. A customer that cannot upload log files or a support technician that cannot download the files is delaying a resolution to a problem. A slowdown in fixing problems can hinder business relationships. Loss of trust can result in the loss of a customer.

The Business Take Away: Not everyone needs high availability, but for some firms, it is a necessity. Firms that use SmartFile for time-sensitive files or mission critical processes should invest in high availability. Downtime in these instances have a direct impact on revenue. Ultimately, high availability is generally deployed to complement an organizational IT strategy.

What is Failover?

Failover, an ever-popular buzzword, provides a contingency plan for services provided. Failover, like high availability, requires the use of two devices, but unlike high availability, failover typically places servers in disparate locations. When one fails, the other takes over. This is related to redundancy, the difference being that redundancy is the physical component, while failover is the mechanism to trigger the contingency plan.

Side Note: Failover vs. Switchover

There are two ways a contingency plan can be triggered and for the device switch occur. Both deliver the same desired outcome, and as a result, the term failover is used to cover both meanings.

Technically, when the switch is automatic it is referred to as failover. If the switch requires human intervention (manual switching of devices, or approval before a contingency plan), it is known as switchover.

What Triggers Failover?

Failover can be triggered in multiple ways. If the backup device senses a link down from the primary, a failover occurs. Failover can also be triggered by an anomaly in the device heartbeat. A heartbeat is a sensing mechanism that communicates between the primary and backup devices. If the primary device stops reacting to the heartbeat, the backup device takes over.

Implementing Failover

Generally, failover primarily serves a strategy aimed at disaster recovery. Two devices are deployed in strategic, geographically diverse locations, typically far away from the original office. This way, if a natural disaster or emergency renders the on-premises device unusable, the second device is out of harm’s way.

At SmartFile, we are able deploy geographically diverse servers that leverage an active-passive redundancy. When an anomaly is detected in the primary’s work pattern, workflow can be directed to the second device. For businesses that have to adapt to hurricane or tornado seasons, failover is a blessing. Downtime and lost revenue are not options, and failover keeps the business running smoothly.

Too Many Options, So Little Time

Deciding whether your business needs high availability, failover, or both is difficult. Ultimately, the decision(s) should complement your organizational IT strategy with a specific focus on the business process being addressed.

Larger firms are inherently focused on business continuity because of the broader impact. If disaster recovery is the focus, failover is your best option. High availability and failover are more than just tech buzzwords. They provide you with a more stable and fortified network.

Integrate High Availability and Failover

SmartFile is a business file mangement platform that gives you more control, compliance and security.

TO SIGN UP