Fine Tuning Incremental Backup

September 18, 2025

One of the pitfalls of developing a large new feature, such as the incremental backup feature that I developed for PostgreSQL 17, is that it's sometimes hard to anticipate the practical problems that people will encounter when attempting to actually make use of it. In the case of the incremental backup feature, I foresaw two major categories of problems, one related to backup management, and the other related to the staging of backups.

Let's talk first about backup management. pg_basebackup takes backups, but it doesn't manage backups for you. For instance, it won't keep track of where you've stored them, and it has no idea about your backup retention policy. For that, you need a tool like barman. Incremental backups make backup management more complicated, because in order to make use of an incremental backup, you will need the earlier backup upon which it depends, and potentially a whole chain of earlier backups. Keeping track of that manually will be extremely error-prone, so it's great news that the barman team has added support for PostgreSQL's incremental backup engine beginning in barman 3.11.0.

But in so doing, they discovered a problem with backup staging that I hadn't really anticipated. I did understand, when I developed the feature, that restoring incremental backups was going to be somewhat expensive: in order to run the pg_combinebackup tool, which assembles a full backup from an incremental backup and the earlier backups upon which it depends, you're going to need to have all of those backups available on the same machine at the same time, which could involve moving around a lot of bytes. However, it wasn't exactly clear to me what features pg_combinebackup would need to have in order to minimize this excess data copying.

The big issue for barman turned out to be that pg_combinebackup always writes out a completely new data directory, copying files from the various input directories as required. That's inefficient if either the input directories are temporary copies that will be discarded, or if the output directory is a temporary staging area that will be deleted after being copied to a remote host. The solution, implemented by my colleague Israel Barth Rubio for PostgreSQL 18, is to give pg_combinebackup a --link option, similar to what already exists for pg_upgrade. The barman team plans to enhance barman to make use of this option in the near future, which will then allow users to restore incremental backups using barman (or other tools with similar support) much more cheaply, provided they're using a newer version of PostgreSQL 18.

I think this tale demonstrates two things. First, it illustrates the difficulty of anticipating exactly where the friction around some new features will be located. Second, it underscores the importance of collaboration between core developers and tool authors; working together, we can make the PostgreSQL experience better for everyone.

Share this