PostgreSQL: High Availability, Disaster Recovery and Fault Tolerance

Caroline Pickens

August 12, 2024

Learn how to manage database servers, data storage, and backups for emergencies. Ensure business continuity with high availability PostgreSQL, disaster recovery, and fault-tolerant design.

EDB is proud to be part of a vibrant open source community, and our experts have been busy attending Postgres conferences and community meetups around the world.

In April, EDB Field CTO-APJ Ajit Gadge gave a talk at FOSS ASIA Summit 2024 about high availability, disaster recovery, and fault tolerance with PostgreSQL. Ensuring business continuity with high availability database systems has always been an important and interesting topic for companies embracing Postgres and digital transformation.
Below is a summary of what he presented.

The Problem with Relying on One Database Server

Downtime is unacceptable for modern enterprises, especially those that rely heavily on digital infrastructure. Imagine relying on one database server for all your operations. Initially, it seems efficient and convenient – queries go in and results come out. The real challenge arises when that solitary server encounters a problem. Suddenly, everything grinds to a halt: no emails, no website, no transactions, and no productivity.

Organizations need high availability, disaster recovery processes, and a fault-tolerant design to protect their data.

High availability - High availability (HA) is a system characteristic that establishes an agreed level of operational performance, usually uptime, for a higher-than-normal period.
Disaster recovery - Disaster recovery (DR) is the maintaining or reestablishing of vital infrastructure and systems after a natural or human-induced disaster.
Fault tolerance – A fault-tolerant design enables a system to continue its operations, possibly at a reduced level, rather than failing when some part of the system malfunctions.

High availability relies on redundancy – having a backup server ready to take over in case of failure. Downtime becomes an inconvenience rather than a blow to productivity.

But redundancy alone isn't sufficient. All servers must be synchronized with the latest data to achieve high availability and fast disaster recovery. This is where replication matters. Whether physically or logically, maintaining data consistency across multiple servers ensures a smooth transition in case of failure. Automated failover mechanisms are the final piece of the puzzle. Technologies like EFM, repmgr, and Patroni redirect operations to the backup server without human intervention. This automated switchover ensures little disruption, keeping downtime to a minimum.

Many details go into maintaining high uptime levels. These systems' designs account for factors like hardware failure, natural disasters, or power outages. From uninterruptible power supplies (UPS) to distributed server setups across different data centers, every contingency upholds operational continuity.

Why High Availability PostgreSQL is Critical

The stakes are high for businesses in today's digital landscape. Every minute of downtime means potential revenue loss and reputational damage. Through robust high availability database systems, organizations safeguard themselves against such risks. Whether it's a small enterprise or a multinational corporation, the principles remain the same: prioritize uptime, mitigate risks, and ensure seamless operations.

It's not just about data storage and analytics; it's about safeguarding businesses in an increasingly interconnected world. As technology evolves, so must our strategies for maintaining uninterrupted operations. With robust high availability systems in place, businesses can weather any storm, secure that their data and operations remain resilient.

High Availability PostgreSQL: FAQ

What are high availability databases?

High availability is a system characteristic that establishes an agreed level of operational performance, usually uptime, for a higher-than-normal period.

High availability databases rely on redundancy, which involves having a backup server ready to take over and perform a database restore seamlessly. Downtime becomes a quick inconvenience.

Why are high availability databases a necessity?

Despite hardware, network, and database technology advancements, many organizations still risk serious database failures. Sixty percent of data operations have experienced an outage in the past three years, with 60% of these outages having productivity disruptions lasting four to 48 hours. The cost is significant: 70% of outages result in over $100,000 to over $1 million in total losses.

Businesses must consider high availability databases and architecture that ensure maximum reliability and continuity.

How is high availability measured?

This is usually done by defining and committing to a certain uptime in your service level agreement (SLA). The “three 9s, four 9s, or five 9s” availability percentage corresponds to the amount of time a system would be unavailable.

When should I consider high availability architecture?

Here's how you can decide what high availability option is best for your business.

Determine the level of availability you hope to achieve
Understand your operational budget
Know the cost to your business if there is downtime in the data persistence tier
Understand your RPO (Recovery Point Objective)
Know your RTO (Recovery Time Objective)

How does high availability PostgreSQL work?

High availability PostgreSQL databases work in two ways:

Streaming replication

The replica connects to the primary and continuously receives a stream of WAL records. Streaming replication lets the replica stay more up-to-date with the primary compared to log-shipping replication.

Synchronous streaming replication

Databases can also configure streaming replication as synchronous by choosing one or more replicas to be synchronous standby. The primary only confirms a transaction commit after the replica acknowledges the transaction persistence.

What key technologies power PostgreSQL high availability?

Repmgr

One of the more “traditional” failover systems, Repmgr was originally for creating PostgreSQL replicas more easily. It’s written in C and uses a customer Raft-like consensus, which means it needs at least three nodes to operate.

Patroni

Patroni is the first “modernized” failover system. Written in Python, it doesn’t rely on quorum. It defers consensus handling to an external layer like etcd, and employs a leadership lease that may only be held by one node at a time.

Pg_auto_failover

Rather than relying on consensus, the pg_auto_failover high availability tool employs a sophisticated state machine where a single monitor process makes decisions for the entire cluster, making it an outlier.

When is standard PostgreSQL replication not enough to maintain high availability?

PostgreSQL’s Native Logical Replication (PNLR) has a few fundamental limitations that can affect high availability systems. The examples include but are not limited to:

Data Definition Language (DDL) operations are not replicated
There is no ability to failover
Logical PostgreSQL replication systems require that each row in a replicated table have a primary key
PNLR is not integrated with backup and recovery solutions
PNLR does not come with best practices and proven architectures for achieving common tasks
PNLR only replicates in one direction

How does EDB Postgres Distributed guarantee high availability database clusters?

EDB PGD architecture promotes high availability for your database clusters through various techniques:

Automatic failover/switchover
Uninterrupted online database maintenance
Patching the system/Postgres with no impact
In-place major version upgrade
Increasing resources on the system

What is Active-Active architecture?

EDB Postgres Distributed is the first to deliver Active-Active architecture.

Active-Active architecture, or Geo-Distributed Active Architecture, is a data resiliency architecture that allots database information over geographically distributed nodes and clusters. It is a network of separate processing nodes with access to a common replicated database. All nodes can participate in a typical application, which means local low latency with each region capable of running in isolation.

What else can EDB PGD do besides provide high availability PostgreSQL?

In addition to providing high availability, EDB Postgres Distributed can also:

Distribute workloads geographically

For example, if you have a three-node EDB PGD architecture and these nodes are spread across the globe, you can use each country's local database to manage the respective countries' workload.

Provide data localization security

Advanced logical PostgreSQL replication in EDB PGD also allows you to choose access rights and maintain data sovereignty—protecting your organization and limiting threats.

In this Article

The Problem with Relying on One Database Server

DATA & AI SOVEREIGNTY

DATA & AI SOVEREIGNTY

PostgreSQL: High Availability, Disaster Recovery and Fault Tolerance

Caroline Pickens

The Problem with Relying on One Database Server

Why High Availability PostgreSQL is Critical

High Availability PostgreSQL: FAQ

Ensure Reliability with High Availability and Solid Disaster Recovery

And avoid downtime’s costly consequences

Offline In-place Major Upgrades with CloudNativePG

Debugging memory leaks in Postgres, jemalloc edition

PostgreSQL Logical and Physical Replication Comparison and the Advantages of Distributed PGD

DATA & AI SOVEREIGNTY

DATA & AI SOVEREIGNTY

PostgreSQL: High Availability, Disaster Recovery and Fault Tolerance

Caroline Pickens

The Problem with Relying on One Database Server

Why High Availability PostgreSQL is Critical

High Availability PostgreSQL: FAQ

Ensure Reliability with High Availability and Solid Disaster Recovery

And avoid downtime’s costly consequences

More Blogs

More Blogs

Offline In-place Major Upgrades with CloudNativePG

Debugging memory leaks in Postgres, jemalloc edition

PostgreSQL Logical and Physical Replication Comparison and the Advantages of Distributed PGD