AGEDB
hero-img

The Critical Factor in Business Value Continuity : Failover

hero-img
AGEDB
Published in AGEDB , 3 min read, Apr 29

Failover: A Critical Element for Business Continuity

Imagine your business without failover capabilities. What impact would disasters such as ransomware attacks, natural catastrophes, hardware failures, file corruption, human errors, and numerous other issues have? Without proper preventive measures, these disruptions can halt production and seriously degrade your business value.

What is Failover? Understanding the Basics

Failover is crucial for backup and disaster recovery because it involves the process of switching essential workloads, systems, and applications to a standby or secondary site when the main site is down or unavailable.

This ensures business operations can continue almost or entirely uninterrupted, even during disasters or scheduled maintenance.

Why is Failover Crucial for Your Business?

Failover is vital for business continuity as it reduces or prevents total system failures, enhancing system resilience against single points of failure. This resilience ensures services continue despite component failures, contributing significantly to a business's uptime and reliability.

Failover is particularly mission-critical systems that must always be available. It allows employees to continue their work without interruption and ensures easy access to files and systems, even during unplanned outages. This is especially important for businesses with strict uptime requirements.

Failover Configurations: How to Set Them Up

Firstly, High Availability (HA) can be set up in Active-Active or Active-Standby modes, and here's how AHM configures it:

Active-Active(Independent Disks)

In an Active-Active HA configuration, two or more nodes perform the same tasks simultaneously. This setup ensures that the workload is evenly distributed and balanced across all nodes, preventing any node from being overloaded. Active-Active clusters can utilize more nodes, thus improving throughput and response times. For the HA cluster to operate smoothly, each node's configuration and settings must be identical.

The AHM(Agens High Availability Manager) system manages and provides high availability for servers, consisting of two types of servers. If there is an issue with the primary server, the secondary server takes over as the primary server and operates accordingly.

Primary, master, read/write server: This server allows data input, modification, deletion, and query

Secondary, slave, standby server: This server reflects changes made on the primary server

Active-Passive (Standby - Independent Disks)

Similar to an Active-Active cluster, an Active-Passive or Active-Standby High Availability (HA) configuration also consists of at least two nodes.

However, as the name suggests, not all nodes are active.In a typical two-node setup, one node is always active while the other remains passive or on standby, ready to take over if the active node fails. For a smooth failover process, both nodes must have identical settings.

Benefits of Effective Failover Strategies

Implementing a robust failover solution is critical for maintaining business continuity and preserving value, offering numerous benefits, including:

  • Guaranteed Business Continuity: Failover ensures that business operations continue as usual, even if disasters occur or key components go offline
  • Improved Uptime: The failover process quickly switches from the primary system to a redundant standby system when the main system is unavailable, minimizing downtime and allowing work to continue without interruption
  • Cost Savings: implementing a failover solution reduces costs associated with downtime, such as lost revenue, productivity, opportunities, and brand reputation damage

Considerations for Implementing Efficient Failover

While properly implemented failover can benefit a company, it is essential to be aware of potential drawbacks and make efforts to mitigate them:

  • Cost: Setting up, managing, and monitoring a failover system involves significant costs, including hardware and software expenses. Ensuring that failover operations run smoothly and automatically may require substantial capital investment in high-bandwidth systems with synchronous data transmission capabilities
  • Expertise Needed: Just like the primary systems, failover systems require professional maintenance, testing, and validation to operate smoothly. If deploying and managing a failover system requires more expertise, businesses might need to rely on external experts, which can significantly increase costs

Activating Failover with AHM: A Step-by-Step Guide

AHM monitors the health of nodes within a cluster to detect and initiate failover. If an active node fails and cannot provide services, the HA solution detects this and switches the standby server to an active role.

During this time, the standby server, which is already running, replicates data from the active server. This minimizes downtime and allows the standby server to continue services seamlessly when the active database is down.

Steps Involved in Failover Process:

  1. AHM on each node detects a failure in the active database
  2. The standby AHM attempts to connect to the primary database
  3. The failover process initiates
  4. The most recently updated node executes a pre-promotion script
  5. Standby AHM promotes the standby database to primary
  6. AHM assigns a virtual IP address to the new leader node to normalize services

Failover - Recovery and Reversion to Primary:

  1. The node affected by the failure is restored as a standby and then can be reverted back to primary.
  2. The restored node calls the `ahm_promote_node` command to re-promote the node to the primary role.

AHM: The Ultimate Solution for Ensuring Stable Business Continuity

Architecture of AHM

AHM operates with components that monitor cluster node failures to detect failovers. It includes:

Failover Process

  • Real-time monitoring of failures in systems, networks, storage, and applications
  • Consensus-building on decisions related to failover
  • Executing failover during failures

Heartbeat

  • Health checks between redundant nodes for Database and AHM
  • Provides a single connection point to client applications (using virtual IP)
  • Manages cluster status
  • Executes pre and post-scripts provided by the user

AgensSQL's HA Solution, Pgpool-II:

  • Supports high availability through the HA (High Availability) extension solution
  • Utilizes a distributed mechanism at the DB session level through the expansion of readonly nodes
  • Detects failures in primary or standby databases and automatically takes appropriate actions to ensure service continuity

Features of Pgpool-II:

  • Connection pool management: Enhances overall performance through the reuse of connections
  • Load balancing: Distributes queries across multiple servers if they hold the same data using the replication function
  • Failover: In the event of a failure in the Master Server, the Slave Server assumes its functions

Agens High Availability Manager(AHM)

  • High Availability Components: AHM is a high-availability component developed by AGEDB to resolve Single Points of Failure (SPOF) in AgensSQL database servers
  • Fault Detection and Automatic Measures: Uses a distributed mechanism to detect failures in primary or standby databases and automatically takes appropriate measures to ensure service continuity
  • Guaranteed Availability: Ensures the availability of database servers for client applications, installed alongside each primary and standby database. If a failure occurs in the primary database or system, AHM coordinates to designate one of the available standby database servers as the new primary and quickly restores services by initiating the VIP on the previous primary server
  • Support for Stable System Operation: Provides features necessary for cluster operation such as fail-over handling, thus ensuring stable system operation and ease of system expansion

This session covered the critical component of failover in preserving business continuity and value. In our next session, we will delve into more technical content and share success stories of AGEDB’s failover implementations.

For more information about AGEDB's advanced technology and training or if you have any inquiries, please contact us at marketing@agedb.io for professional consultation.

The information you provide will be used in accordance with our Privacy Policy .