Re: How to get a more RSYNC compatible output of pg_dump?

From: Holger Jakobs <holger(at)jakobs(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: How to get a more RSYNC compatible output of pg_dump?
Date: 2022-05-16 10:52:41
Message-ID: a7e6aff9-7f13-5ebd-db1c-2cb899664909@jakobs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Am 16.05.22 um 09:56 schrieb Thorsten Schöning:
> Hi everyone,
>
> for various historical reasons I maintain a database containing large
> file uploads, which makes uncompressed output of pg_dump ~200 GiB in
> size currently. I'm storing that dump to some NAS and am trying to
> forward it from there using RSYNC to multiple different additional
> offsite USB disks.
>
> I'm doing the same with the files directory of Postgres already after
> taking BTRFS snapshots etc. and for those files things work pretty
> well with RSYNC. Lots of files are skipped entirely, some are slightly
> updated in-place, some updates are a bit larger depending on the
> actual changes and when RSYNC executed last etc.
>
> Though, with the large dumps it seems to me that with every slight
> change in the actual data the entire dump gets downloaded again. I'm
> already using uncompressed dumps in the hope that the output is more
> stable and RSYNC better able to recognize unchanged parts. But I guess
> that most changes in the dumped data simply result in all subsequent
> data being that misplaced compared to what RSYNC reads against, that
> it's like downloading the whole file again in the end.
>
> Is that simply the way it is or are there some optimizations possible
> when using pg_dump? Am using Postgres 11 and don't see anything which
> seems to help in this use-case.
>
> Thanks!
>
> Mit freundlichen Grüßen
>
> Thorsten Schöning
>
Hi Thorsten,

This is an rsync question, not a pg_dump question.

If you want to sync a new version of a file without transferring the
whole thing, you have to use the option -c or --checksum.

This works well only if some blocks of the file have changed, while most
others haven't. This won't be the case of a pg_dump.

So I don't see a way of re-syncing the way you expect it to.

Regards,

Holger

--
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message hubert depesz lubaczewski 2022-05-16 12:28:29 Re: How to get a more RSYNC compatible output of pg_dump?
Previous Message Thorsten Schöning 2022-05-16 07:56:34 How to get a more RSYNC compatible output of pg_dump?