Quick Links

Re: finding changed blocks using WAL scanning

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: finding changed blocks using WAL scanning
Date:	2019-04-18 20:25:24
Message-ID:	CA+TgmoZLU7FzEy3rKW54fi8CxDxaQtgYVyrravdLDdvJmcvxSQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Apr 18, 2019 at 3:51 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> How would you choose the STARTLSN/ENDLSN? If you could do it per
> checkpoint, rather than per-WAL, I think that would be great.

I thought of that too. It seems appealing, because you probably only
really care whether a particular block was modified between one
checkpoint and the next, not exactly when during that interval it was
modified. However, the simple algorithm of "just stop when you get to
a checkpoint record" does not work, because the checkpoint record
itself points back to a much earlier LSN, and I think that it's that
earlier LSN that is interesting. So if you want to make this work you
have to be more clever, and I'm not sure I'm clever enough.

I think it's important that a .modblock file not be too large, because
then it will use too much memory, and that it not cover too much WAL,
because then it will be too imprecise about when the blocks were
modified. Perhaps we should have a threshold for each -- e.g. emit
the next .modblock file after finding 2^20 distinct block references
or scanning 1GB of WAL. Then individual files would probably be in
the single-digit numbers of megabytes in size, assuming we do a decent
job with the compression, and you never need to scan more than 1GB of
WAL to regenerate one. If the starting point for a backup falls in
the middle of such a file, and you include the whole file, at worst
you have ~8GB of extra blocks to read, but in most cases less, because
your writes probably have some locality and the file may not actually
contain the full 2^20 block references. You could also make it more
fine-grained than that if you don't mind having more smaller files
floating around.

It would definitely be better if we could set things up so that we
could always switch to the next .modblock file when we cross a
potential redo start point, but they're not noted in the WAL so I
don't see how to do that. I don't know if it would be possible to
insert some new kind of log record concurrently with fixing the redo
location, so that redo always started at a record of this new type.
That would certainly be helpful for this kind of thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: finding changed blocks using WAL scanning at 2019-04-18 19:51:57 from Bruce Momjian

Responses

Re: finding changed blocks using WAL scanning at 2019-04-18 21:47:56 from Bruce Momjian
Re: finding changed blocks using WAL scanning at 2019-04-22 23:04:25 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2019-04-18 20:59:12	Re: block-level incremental backup
Previous Message	Bruce Momjian	2019-04-18 19:51:57	Re: finding changed blocks using WAL scanning