Quick Links

Re: deduplicating backup of multiple pg_dump dumps

From:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To:	Egor Duda <egor(dot)duda(at)gmail(dot)com>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject:	Re: deduplicating backup of multiple pg_dump dumps
Date:	2018-01-29 14:03:40
Message-ID:	1517234620.2622.41.camel@cybertec.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Egor Duda wrote:
> I've recently tried to use borg backup (https://borgbackup.readthedocs.io/) to store multiple
> PostgreSQL database dumps, and encountered a problem. Due to nondeterministic nature of pg_dump it
> reorders data tables rows on each invocation, which breaks borg backup chunking and deduplication
> algorithm.
>
> This means that each next dump in backup almost never reuses data from previous dumps, and so it's
> not possible to store multiple database dumps as efficiently as possible.
>
> I wonder if there's any way to force pg_dump use some predictable ordering of data rows (for
> example, by primary key, where possible) to make dumps more uniform, similar to mysqldump
> --order-by-primary option?

There is no such option.

I think you would be better off with physical backups using "pg_basebackup" if you
want to deduplicate, at least if deduplication is on the block level.

Yours,
Laurenz Albe

In response to

deduplicating backup of multiple pg_dump dumps at 2018-01-29 14:01:06 from Egor Duda

Responses

Re: deduplicating backup of multiple pg_dump dumps at 2018-01-29 15:56:20 from Tom Lane

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Tom Lane	2018-01-29 15:56:20	Re: deduplicating backup of multiple pg_dump dumps
Previous Message	Egor Duda	2018-01-29 14:01:06	deduplicating backup of multiple pg_dump dumps