Why OpenAI Should Use Postgres Distributed

Jozef de Vries

February 03, 2026

Last week, OpenAI detailed their design of scaling a single-primary PostgreSQL instance to support 800 million users. Shortly after, MariaDB claimed their product could have handled those same users without the "cracks" OpenAI encountered.

While OpenAI’s approach certainly illustrated deep Postgres design and operational expertise, it also illustrated the lengths teams have to go through to serve modern workloads with single writer architecture.

MariaDB’s response suggests that the better option is to leave the Postgres ecosystem entirely. The arguments against this is that Postgres has won - it is the fastest growing database across the board and has consistently ranked above MariaDB by factors of 10 for almost the past 10 years according to the yearly Stack Overflow survey.

In part it has won because of its highly extensible foundation and very vibrant ecosystem of innovation. It is precisely that combination of extensibility and innovation that empowers its users to do more than what just ‘off the shelf’ Postgres can do. For today’s high volume workloads, there are better solutions – without having to move away from Postgres.

A Better Way: Active-Active Postgres with EDB Postgres Distributed

With EDB Postgres Distributed (PGD), organizations don’t have to work within the confines of traditional physical streaming replication models. Nor do they have to go through the painstaking process of moving to new database engines. PGD enables active-active, distributed Postgres architecture while remaining 100% Postgres.

PGD is a cluster management solution that enables its users to meet the scalability and reliability requirements of applications that support core national infrastructure, whether emergency services, utility grids, and global payment processing, to name just a few.

PGD transforms standard PostgreSQL into a distributed, always-on database platform that is designed for modern enterprise scalability, supporting scale-out read performance, granular data locality, seamless lifecycle management across all nodes and regions.

Part 1: Why Azure "Flexible Server" Hits a Wall

OpenAI used Azure Database for PostgreSQL Flexible Server, a robust service, but one fundamentally built on a Single-Primary architecture. As their user base exploded, the systemic risks to business continuity became clear.

1. The Revenue Risk: The Single-Writer Bottleneck

In Azure Flex, every single write, every ChatGPT conversation and billing record, must pass through one primary node. OpenAI was forced to move write-heavy transactions to Cosmos DB just to handle the load.

The business risk: If that primary node or its Availability Zone (AZ) suffers any distinctly measurable disruption, OpenAI faces what would potentially be a total write outage. A 60-second failover in Azure isn't just a hiccup; it’s millions of dollars in lost customer trust.
The PGD advantage: PGD's Active-Active (Multi-Master) architecture offers a significant advantage. Its use of fast failovers multiplies throughput by distributing write workloads across multiple nodes and regions, thereby also eliminating any single point of failure of a single write leader.

2. The Operational Risk: Maintenance as a Liability

According to OpenAI, even routine database changes requiring administrative intervention or cleanup could create measurable performance risk for their 800 million users. That reality placed significant limits on schema and system design, forcing difficult trade-offs where optimization often had to yield to availability and operational stability.

The business risk: For a global 24/7 service, there is no "off-peak." You are forced to choose between performance degradation or risky but routine maintenance disruption.
The PGD advantage: PGD provides Always On Maintenance. You can perform a REINDEX, VACUUM FULL, or major version upgrade on one node at a time without stopping the write stream on others. In addition, PGD provides ‘cluster aware orchestration’ so you can issue a single command and it will run sequentially across the cluster. In other words, routine maintenance does not compromise availability or increase operational overhead of the system.

3. The Customer Satisfaction Risk

OpenAI initially suffered incidents when their 5,000-connection limit was hit, causing cascading failures. They had to resort to added architectural complexity to mitigate; external proxies and application-side rate limiting.

The business risk: When engineers spend 50% of their time building workarounds for database limits, innovation stalls.
The PGD advantage: PGD includes a Native Connection Manager. It manages thousands of connections and updates routing in sub-seconds, preventing connection overload before it starts.

Part 2: High Availability vs. Business Resilience

Feature / Risk Factor	Azure PG Flexserver (OpenAI's Current State)	EDB Postgres Distributed (PGD)
Write Architecture	Single Primary (High Risk Bottleneck)	Multi-Master (Resilient Active-Active)
Read Scalability	Primary Overload	Subscriber-Only Node and node group expandable beyond 200 total nodes
Recovery Time (RTO)	Minutes (Dependent on cloud fabric and physical replication)	Sub-second (leader-election)
Maintenance	Potential for degradation; maintenance must be carefully coordinated between app and DBA teams to avoid disruption.	Online maintenance supports upgrades, DDL, VACUUM, REINDEX, etc.
Data Durability	Standard sync/async replicas. Potential for unbounded RPO unless heavy penalty paid for synchronous replication.	Multiple commit scopes for durability and consistency
Dev Velocity	Engineers are limited by database rules and operations	DB scales and adopts to engineer’s needs
Portability	Runs only on Azure Cloud	Runs on all clouds, on-prem, Kubernetes, and future Stargate data centers on Oracle Cloud

Part 3: Addressing the MariaDB "Alternative"

MariaDB suggests their Galera clusters avoid the "Postgres cracks." However, their argument ignores the most important trend: Postgres has won the developer heart and mind.

Countering the Technical Claims: Commit Scopes

MariaDB claims Galera is superior, but it relies on Synchronous Replication, which creates a "Certification Wall." If you have a node in another region, every write must wait for global agreement, causing massive latency that only gets worse the more distributed your system.

PGD offers a more sophisticated answer: Commit Scopes, which allow you to configure and control how data is written and committed to the system. This flexibility enables users to define workload patterns factoring in considerations for performance, durability, and availability.

Synchronous Commit: This mode commits a transaction locally but keeps its locks open until a configured quorum of nodes acknowledges receipt. It includes an Auto-Degrade functionality that can automatically switch to asynchronous mode if the synchronous nodes become unavailable, balancing durability with system availability.
Lag Control: Designed for asynchronous replication, this feature allows you to set a threshold (in time or bytes) for replication lag. If the risk of data loss exceeds this configured limit, PGD automatically throttles or inserts delays into the local commit to bound the risk.
Eager Conflict Resolution: A pessimistic resolution strategy that detects and manages potential conflicts before a transaction is committed, ensuring data consistency across the cluster.
Synchronous Replication with Consensus: PGD's durability is built on a custom Raft-based consensus implementation, which automates leader election and ensures that all participating nodes have a consistent view of the cluster state and transaction history.

Purpose-Built

OpenAI’s ability to scale to 800 million users very much highlights the power of Postgres, but it’s not really tapped into the real potential of Postgres. It is a willful display of ingenuity, and a testament to the deep and broad expertise that is behind the partnership of the two companies, but it still begs the question of whether it is sustainable. It begs the question of whether Azure Postgres Flexserver is designed to rinse and repeat this kind of architecture for more than one, a dozen, or hundreds of similar architectures.

EDB Postgres Distributed is. It is designed inherently for always on workloads, and specifically designed to serve up some of the most critical workloads around the globe. And PGD does this hundreds of times over.

If you’re an individual that is responsible for maintaining the durability and availability of core national infrastructure systems, do you want to deploy a system that is inherently designed for your system workloads, or cleverly configured for your system workloads?

Imagine if a global payment processing platform went offline for 60 seconds…

Imagine if your country’s national power grid orchestration all routed through a single primary leader…

Imagine if your nation’s emergency services risked maintenance outages because someone executed a long running query…

In this Article

A Better Way: Active-Active Postgres with EDB P...
Part 1: Why Azure "Flexible Server" Hits a Wall
Part 2: High Availability vs. Business Resilience
Part 3: Addressing the MariaDB "Alternative"
Purpose-Built

DATA & AI SOVEREIGNTY

DATA & AI SOVEREIGNTY

Why OpenAI Should Use Postgres Distributed

Jozef de Vries

A Better Way: Active-Active Postgres with EDB Postgres Distributed

Part 1: Why Azure "Flexible Server" Hits a Wall

1. The Revenue Risk: The Single-Writer Bottleneck

2. The Operational Risk: Maintenance as a Liability

3. The Customer Satisfaction Risk

Part 2: High Availability vs. Business Resilience

Part 3: Addressing the MariaDB "Alternative"

Countering the Technical Claims: Commit Scopes

Purpose-Built

Postgres Distributed: A Native Path Forward for OpenAI’s Database Layer

Part 2: PostgreSQL’s incredible trip to the top with developers

Unlock Massive Savings and Boost Performance: How EDB Postgres AI Empowers Developers and DBAs to Optimize Instead of "Scale by Credit Card"

DATA & AI SOVEREIGNTY

DATA & AI SOVEREIGNTY

Why OpenAI Should Use Postgres Distributed

Jozef de Vries

A Better Way: Active-Active Postgres with EDB Postgres Distributed

Part 1: Why Azure "Flexible Server" Hits a Wall

1. The Revenue Risk: The Single-Writer Bottleneck

2. The Operational Risk: Maintenance as a Liability

3. The Customer Satisfaction Risk

Part 2: High Availability vs. Business Resilience

Part 3: Addressing the MariaDB "Alternative"

Countering the Technical Claims: Commit Scopes

Purpose-Built

More Blogs

More Blogs

Postgres Distributed: A Native Path Forward for OpenAI’s Database Layer

Part 2: PostgreSQL’s incredible trip to the top with developers

Unlock Massive Savings and Boost Performance: How EDB Postgres AI Empowers Developers and DBAs to Optimize Instead of "Scale by Credit Card"