Re: finding changed blocks using WAL scanning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-24 17:56:29
Message-ID: CA+TgmoaU+90sNyT6ek64GnVhYdPX1O7UAtU+58-h3MAJjA9s-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 24, 2019 at 10:10 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >I'm still interested in the answer to this question, but I don't see a
> >reply that specifically concerns it. Apologies if I have missed one.
>
> I don't think prefetching WAL blocks is all that important. The WAL
> segment was probably received fairly recently (either from primary or
> archive) and so it's reasonable to assume it's still in page cache. And
> even if it's not, sequential reads are handled by readahead pretty well.
> Which is a form of prefetching.

True. But if you are going to need to read the WAL anyway to apply
it, why shouldn't the prefetcher just read it first and use that to
drive prefetching, instead of using the modblock files? It's strictly
less I/O, because you were going to read the WAL files anyway and now
you don't have to also read some other modblock file, and it doesn't
really seem to have any disadvantages.

> >It's pretty clear in my mind that what I want to do here is provide
> >approximate information, not exact information. Being able to sort
> >and deduplicate in advance seems critical to be able to make something
> >like this work on high-velocity systems.
>
> Do you have any analysis / data to support that claim? I mean, it's
> obvious that sorting and deduplicating the data right away makes
> subsequent processing more efficient, but it's not clear to me that not
> doing it would make it useless for high-velocity systems.

I did include some analysis of this point in my original post. It
does depend on your assumptions. If you assume that users will be OK
with memory usage that runs into the tens of gigabytes when the amount
of change since the last incremental backup is very large, then there
is probably no big problem, but that assumption sounds shaky to me.

(The customers I seem to end up on the phone with seem to be
disproportionately those running enormous systems on dramatically
underpowered hardware, which is not infrequently related to the reason
I end up on the phone with them.)

> Sure, but that's not what I proposed elsewhere in this thread. My proposal
> was to keep mdblocks "raw" for WAL segments that were not recycled yet (so
> ~3 last checkpoints), and deduplicate them after that. So vast majority of
> the 1TB of WAL will have already deduplicated data.

OK, I missed that proposal. My biggest concern about this is that I
don't see how to square this with the proposal elsewhere on this
thread that these files should be put someplace that makes them
subject to archiving. If the files are managed by the master in a
separate directory it can easily do this sort of thing, but if they're
archived then you can't. Now maybe that's just a reason not to adopt
that proposal, but I don't see how to adopt both that proposal and
this one, unless we just say that we're going to spew craptons of tiny
little non-deduplicated modblock files into the archive.

> Also, maybe we can do partial deduplication, in a way that would be useful
> for prefetching. Say we only deduplicate 1MB windows - that would work at
> least for cases that touch the same page frequently (say, by inserting to
> the tail of an index, or so).

Maybe, but I'm not sure that's really optimal for any use case.

> FWIW no one cares about low-velocity systems. While raw modblock files
> would not be an issue on them, it's also mostly uninteresting from the
> prefetching perspective. It's the high-velocity sytems that have lag.

I don't think that's particularly fair. Low-velocity systems are some
of the best candidates for incremental backup, and people who are
running such systems probably care about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-04-24 18:02:43 Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation
Previous Message Peter Geoghegan 2019-04-24 17:43:57 Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation