From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: finding changed blocks using WAL scanning |
Date: | 2019-04-23 14:22:46 |
Message-ID: | 20190423142246.GO6197@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On Sat, Apr 20, 2019 at 04:21:52PM -0400, Robert Haas wrote:
> >On Sat, Apr 20, 2019 at 12:42 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> >>> Oh. Well, I already explained my algorithm for doing that upthread,
> >>> which I believe would be quite cheap.
> >>>
> >>> 1. When you generate the .modblock files, stick all the block
> >>> references into a buffer. qsort(). Dedup. Write out in sorted
> >>> order.
> >>
> >>Having all of the block references in a sorted order does seem like it
> >>would help, but would also make those potentially quite a bit larger
> >>than necessary (I had some thoughts about making them smaller elsewhere
> >>in this discussion). That might be worth it though. I suppose it might
> >>also be possible to line up the bitmaps suggested elsewhere to do
> >>essentially a BitmapOr of them to identify the blocks changed (while
> >>effectively de-duping at the same time).
> >
> >I don't see why this would make them bigger than necessary. If you
> >sort by relfilenode/fork/blocknumber and dedup, then references to
> >nearby blocks will be adjacent in the file. You can then decide what
> >format will represent that most efficiently on output. Whether or not
> >a bitmap is better idea than a list of block numbers or something else
> >depends on what percentage of blocks are modified and how clustered
> >they are.
>
> Not sure I understand correctly - do you suggest to deduplicate and sort
> the data before writing them into the .modblock files? Because that the
> the sorting would make this information mostly useless for the recovery
> prefetching use case I mentioned elsewhere. For that to work we need
> information about both the LSN and block, in the LSN order.
I'm not sure I follow- why does the prefetching need to get the blocks
in LSN order..? Once the blocks that we know are going to change in the
next segment have been identified, we could prefetch them all and have
them ready for when replay gets to them. I'm not sure that we
specifically need to have them pre-fetched in the same order that the
replay happens and it might even be better to fetch them in an order
that's as sequential as possible to get them in as quickly as possible.
> So if we want to allow that use case to leverage this infrastructure, we
> need to write the .modfiles kinda "raw" and do this processing in some
> later step.
If we really need the LSN info for the blocks, then we could still
de-dup, picking the 'first modified in this segment at LSN X', or keep
both first and last, or I suppose every LSN if we really want, and then
have that information included with the other information about the
block. Downstream clients could then sort based on the LSN info if they
want to have a list of blocks in sorted-by-LSN-order.
> Now, maybe the incremental backup use case is so much more important the
> right thing to do is ignore this other use case, and I'm OK with that -
> as long as it's a conscious choice.
I'd certainly like to have a way to prefetch, but I'm not entirely sure
that it makes sense to combine it with this, so while I sketched out
some ideas about how to do that above, I don't want it to come across as
being a strong endorsement of the overall idea.
For pre-fetching purposes, for an async streaming replica, it seems like
the wal sender process could potentially just scan the WAL and have a
list of blocks ready to pass to the replica which are "this is what's
coming soon" or similar, rather than working with the modfiles at all.
Not sure if we'd always send that or if we wait for the replica to ask
for it. Though for doing WAL replay from the archive, being able to ask
for the modfile first to do prefetching before replaying the WAL itself
could certainly be beneficial, so maybe it does make sense to have that
information there too.. still not sure we really need it in LSN order
or that we need to prefetch in LSN order though.
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-04-23 15:27:29 | Re: finding changed blocks using WAL scanning |
Previous Message | Tom Lane | 2019-04-23 13:56:39 | Re: Trouble with FETCH_COUNT and combined queries in psql |