deduplicating backup of multiple pg_dump dumps

From: Egor Duda <egor(dot)duda(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: deduplicating backup of multiple pg_dump dumps
Date: 2018-01-29 14:01:06
Message-ID: 5f4d2bdd-e0b6-35e7-501d-e37e5e41a92f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello!

I've recently tried to use borg backup (https://borgbackup.readthedocs.io/) to store multiple
PostgreSQL database dumps, and encountered a problem. Due to nondeterministic nature of pg_dump it
reorders data tables rows on each invocation, which breaks borg backup chunking and deduplication
algorithm.

This means that each next dump in backup almost never reuses data from previous dumps, and so it's
not possible to store multiple database dumps as efficiently as possible.

I wonder if there's any way to force pg_dump use some predictable ordering of data rows (for
example, by primary key, where possible) to make dumps more uniform, similar to mysqldump
--order-by-primary option?

Egor.

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2018-01-29 14:03:40 Re: deduplicating backup of multiple pg_dump dumps
Previous Message David G. Johnston 2018-01-29 13:27:39 Re: Need to check disabled constraints