Why Your Analytical Database Needs Multiple Clusters to Do What WarehousePG Does With One

December 16, 2025

This blog is co-authored by Jack Christie and Dave Stone.

When Querying Business Data Becomes a Budget Problem

"We have been plagued by runaway costs for querying our 50TB cloud data warehouse."

That's Mr. Jung, Heung Sik, Head of IT Support at Kyobo Book Centre (South Korea's largest bookstore chain) describing a problem that's become increasingly common: the consumption-based pricing model that made cloud data warehouses attractive for data engineering and scheduled analytics creates unpredictable costs when you have high-concurrency BI workloads with hundreds of business users simultaneously refreshing dashboards, drilling into reports, and triggering real-time queries.

Kyobo had 50TB of business data and growing, a small analytics team supporting Tableau users, and years remaining on their cloud data warehouse contract. Every dashboard refresh generated charges. Every analyst query added to the bill. Performance required constant optimization, and the team was spending more time managing costs than delivering insights.

They couldn't migrate because they were contractually committed. But they also couldn't sustain the economics of consumption-based compute for high-concurrency analytics workloads.

The Concurrency Challenge: Why BI Workloads Scale Differently

Cloud data warehouses excel at what they were designed for: large-scale data engineering and scheduled analytical workloads. The consumption model makes sense when you're running periodic ETL jobs or training ML models. You use resources, then they scale down.

The Modern Pace of Business Intelligence

Modern analytics is different. You have business users refreshing dashboards throughout the day. Data scientists running exploratory queries. Financial analysts generating reports. AI agents providing conversational analytics insights. All simultaneously.

This is where the architectural choices behind modern cloud platforms create challenges. A recent analysis by Principal Architect Nick Akincilar tested how platforms handle concurrent BI users running hundreds of unique queries against a multi-billion row dataset (a realistic enterprise scenario).

The results showed significant differences in how platforms manage concurrency:

  • Resource scaling varied dramatically, with some platforms spinning up 5 clusters while others needed only 2 for the same workload
  • Queue times ranged from near-instant to 30+ seconds as platforms decided how to allocate resources
  • Query failure rates in some cases reached nearly 4% due to connection management issues
  • Cost differences of 73% for identical workloads based purely on how platforms auto-scale

As Akincilar observed: "Nothing like trying to save $1 on data engineering to spend $4 extra on consumption and have business complaining about their dashboards."

Why Concurrency Matters for the AI Generation

When we talk about modern analytics, nothing comes to mind faster than agentic analytics: autonomous AI agents querying analytical databases just like a human. This puts even more concurrent load on the warehouse, and it’s not just straightforward user growth.

Think of each agent as a superhuman user, writing and executing hundreds of queries in a matter of seconds, then evaluating to determine if more queries are required to complete their goals. Even if cloud data warehouses could keep up, the budget would explode before performance even had a chance to tank.

Why Architecture Matters for Concurrent Workloads

Modern cloud data warehouses were built for elastic workloads (those unpredictable spikes where you suddenly need 10x the compute, then scale back down an hour later). They handle this by spinning up additional clusters when demand increases, then billing you for what you used.

This works beautifully for data engineering: batch jobs that run at 2 AM, ML training that happens weekly, ETL pipelines that spike and finish. Variable work deserves variable pricing.

But BI workloads aren't variable. They’re predictable, even with agents.

While agentic analytics introduces a stream of unpredictable queries, the overall BI workload pattern is still highly predictable. You know roughly how many analysts and agents will be querying data during business hours. Elastic pricing is optimized for unpredictable peaks in demand (a huge, temporary spike), whereas AI agents represent a predictable, sustained increase in the baseline concurrency.

The problem: when platforms designed for variable workloads meet predictable concurrency patterns, you get unpredictable costs. More users means more clusters means higher bills, even though your actual usage pattern hasn't changed, just the number of people accessing data simultaneously.

The equation is simple: elastic infrastructure creates elastic costs. For workloads that aren't actually elastic, that's a mismatch.

The Postgres Alternative: Predictable Performance, Predictable Costs

Postgres-based data warehouses take a fundamentally different approach. Built on MPP (Massively Parallel Processing) architecture with decades of proven technology, these platforms handle concurrent workloads without requiring multiple cluster orchestration.

Figure 1
Figure 1: MPP architecture scales performance by distributing rows and processing work across multiple independent segment hosts. The segment coordinator creates the optimal execution plan, instructing each segment on how to process its specific portion of the data concurrently (e.g. directing 16 segments to each process 1M rows of a 16M row table).


Independent benchmark testing shows compelling results for this approach. Early findings indicate that Postgres-based architectures deliver up to 62% cost savings for high-concurrency, high-volume analytics workloads, and handle concurrency scaling up to 63% more efficiently than leading cloud data warehouses, while maintaining predictable cost structures and consistent user experience. Full benchmark results will be available in the coming weeks.

For mission-critical operations, this stability matters. MNTN, a leading connected TV ad-tech platform managing petabyte-scale data, has been using Postgres-based data warehouses for years. They’ve always been happy with performance, but needed an open source alternative and an enterprise partner to guarantee operational uptime and responsive support. EDB Postgres AI for WarehousePG was their solution. As Greg Spiegelberg, Head of Data at MNTN, put it: "The performance is there, the stability is there, the support is responsive as they should be. I'm just happy that there's somebody there that can be with me in the middle of the night and I'm not, quite literally, hacking open-source code trying to get the database recovered."

Simple Beats Complex for Known Workloads

When you know you'll have consistent BI workloads (daily dashboard refreshes, regular reporting cycles, predictable analyst queries), the path forward doesn't require a high-risk infrastructure replacement.

The difference comes down to predictability:

Consumption-based platforms optimize for variable workloads. You pay for what you use, but "what you use" depends on hidden variables: how aggressively the platform auto-scales, how long clusters stay warm, whether queries queue or execute immediately, and how many clusters spawn to handle concurrent demand.

Capacity-based platforms optimize for known workloads. You provision cores based on your concurrency needs and pay the same amount whether you run 1,000 queries or 100,000. When platforms need to spin up 3-5 additional clusters to handle peak concurrency, your capacity-based costs remain flat.

For organizations with established BI workloads (which is most enterprises), predictability often matters more than elasticity.

Making the transition doesn't mean ripping out your existing infrastructure. Because WarehousePG is rooted in Postgres, teams can leverage familiar SQL skills immediately. For organizations already running Greenplum, the transition is a zero-migration binary swap that can be done in hours, not months.

Euronext FX, a leading pan-European market infrastructure, made exactly this transition to eliminate vendor lock-in with their existing Greenplum deployments. As Grigoriy Zeleniy, Global CTO at Euronext FX, explained: "We're excited to be working with EDB Postgres AI. Its support for Greenplum Workloads is helping us maintain control of where and how we deploy open source software." WarehousePG delivered a seamless binary swap that provided superior enterprise support and open source control across their four global data centers.

In an era of GDPR, data residency requirements, and increasing regulatory scrutiny, this control matters. Organizations need to know exactly where their data resides and who has access, without sacrificing performance or predictability.

Beyond cost predictability, WarehousePG's MPP architecture handles high-volume reporting and BI with workload management that prioritizes critical queries. Native vector capabilities via pgvector support AI feature engineering and model training directly on your data (no need to move data to separate infrastructure). Streaming ingestion handles real-time log and event analysis. And federated queries across data lakes break down data silos without complex ETL pipelines.

The Path Forward

If you're experiencing unpredictable costs under concurrent BI workloads, you're not alone. The consumption model that made cloud data warehouses attractive for data engineering creates different economics for business intelligence.

You have options:

  • Optimize your current platform (implement workload management, tune auto-scaling policies, monitor consumption patterns)
  • Accept variable costs as the price of cloud elasticity
  • Consider whether capacity-based pricing on proven architecture might better match your workload patterns

Kyobo Book Centre chose the third path. They implemented a hybrid approach: data remains in their existing cloud platform within their VPC (maintaining data sovereignty and contract compliance), but analytics queries route through EDB Postgres AI for WarehousePG (a Postgres-based MPP engine with per-core pricing).

The results:

  • Predictable costs: Per-core pricing instead of consumption charges
  • Simplified operations: Their small team manages fewer moving parts
  • Better performance: Direct Tableau connectivity without extraction workflows
  • Maintained sovereignty: Data stays in their VPC, meeting regulatory requirements

The question isn't whether modern cloud platforms can scale (they absolutely can). The question is whether you're paying for multiple clusters and elastic scaling when what you actually need is predictable performance at predictable cost.

Explore EDB Postgres AI for WarehousePG to learn more about how Postgres-based analytics handles high-concurrency workloads with cost predictability, or contact our team for a workload assessment.
 

Share this