Re: finding changed blocks using WAL scanning

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-11 17:00:35
Message-ID: CALfoeis0qOyGk+KQ3AbkfRVv=XbsSecqHfKSag=i_SLWMT+B0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 11, 2019 at 6:27 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Apr 11, 2019 at 3:52 AM Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> > I had in mind that you could have different overlapping incremental
> > backup jobs in existence at the same time. Maybe a daily one to a
> > nearby disk and a weekly one to a faraway cloud. Each one of these
> > would need a separate replication slot, so that the information that is
> > required for *that* incremental backup series is preserved between runs.
> > So just one reserved replication slot that feeds the block summaries
> > wouldn't work. Perhaps what would work is a flag on the replication
> > slot itself "keep block summaries for this slot". Then when all the
> > slots with the block summary flag are past an LSN, you can clean up the
> > summaries before that LSN.
>
> I don't think that quite works. There are two different LSNs. One is
> the LSN of the oldest WAL archive that we need to keep around so that
> it can be summarized, and the other is the LSN of the oldest summary
> we need to keep around so it can be used for incremental backup
> purposes. You can't keep both of those LSNs in the same slot.
> Furthermore, the LSN stored in the slot is defined as the amount of
> WAL we need to keep, not the amount of something else (summaries) that
> we need to keep. Reusing that same field to mean something different
> sounds inadvisable.
>
> In other words, I think there are two problems which we need to
> clearly separate: one is retaining WAL so we can generate summaries,
> and the other is retaining summaries so we can generate incremental
> backups. Even if we solve the second problem by using some kind of
> replication slot, we still need to solve the first problem somehow.
>

Just a thought for first problem, may not to simpler, can replication slot
be enhanced to define X amount of WAL to retain, after reaching such limit
collect summary and let the WAL be deleted.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-04-11 17:02:02 Re: Reducing the runtime of the core regression tests
Previous Message Andres Freund 2019-04-11 16:58:12 Re: Enable data checksums by default