Re: pg_basebackup + delta base backups

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Christopher Pereira <kripper(at)imatronix(dot)cl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_basebackup + delta base backups
Date: 2020-08-15 19:17:03
Message-ID: 20200815191703.GJ29590@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greetings,

* Christopher Pereira (kripper(at)imatronix(dot)cl) wrote:
> On 26-May-20 10:20, Stephen Frost wrote:
> >"out of sync" is a bit of an odd concept, but having a replica fall
> >behind a long way is certainly something that can happen and may require
> >a rebuild from a backup (or from a new sync off of the primary in some
> >other way, as you suggest below). In a situation where there's async
> >replication happening and you promote a replica to take over, that's
> >definitely a case where you might also have to rebuild the former
> >primary.
>
> Yes, a common case with async streaming is when primary (A) goes down and
> replica is promoted as a new master (B).
> Then A comes back and has some data that was not streamed to B so pg_rewind
> is useless.
>
> I wonder if there is some option to just discard this branched data from A
> in order to start as a new replica.

... This is exactly what pg_rewind is for, so I'm not really sure what
you're asking about here.

> I noticed that pg_rewind is useless even when both DBs are identical
> (according to pg_dumpall | md5sum).

That isn't really a meaningful or useful thing to consider.

> >>As you said, all the pieces are there and it would be quite easy to write a
> >>new "pg_basebackup_delta" script that could be executed on the standby host
> >>to:
> >>
> >>1) setup a pgBackRest repo on the primary host (via SSH)
> >>
> >>2) create a backup on the primary host (via SSH)
> >>
> >>3) do a delta restore on the standby
> >>
> >>Even when the repository on the primary host is only created temporarily
> >>(and require double storage, resources, etc), it may still be worth
> >>considering the traffic that can be saved by doing a delta restore on a
> >>standby host in a different region, right?
> >So... There's actually a way to do this with pgbackrest, but it doesn't
> >support the delta capability.
>
> If I understood correctly the method you described, you were basically doing
> a "backup" between A (primary) and B (repo) and in such a way the repo is
> then compatible with the pg_data structure, but without delta support (ie.
> transfering the whole database)?

Yes.

> Delta support is critical for VLDBs, so I see two alternatives to replace
> pg_basebackup with pgbackrest to rebuild a replica:
>
> 1) Create a temporary repo on the primary
>
> 2) Create a temporary repo on the replica

Yes, if you use a repo then you can use pgbackrest's incremental and
delta features.

> All configurations would be undone after the replica has been rebuilt and
> both alternatives would be using delta over the wire.
> In your opinion, which alternative is better considering network traffic?

If you want to minimize network traffic then probably using either
incremental backups or delta restore would make the most sense.

Thanks,

Stephen

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andrus 2020-08-16 06:32:14 How to distribute products to shop by amount of sales
Previous Message Mark Phillips 2020-08-15 19:14:29 Re: serial + db key, or guid?