From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: WAL prefetch (another approach) |
Date: | 2020-11-18 05:10:31 |
Message-ID: | CA+hUKGJ1=pOiNjSgXYJnjE3OyRtp8tjMRcON256e5EFpzPpgtA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Nov 14, 2020 at 4:13 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
> > On 11/13/20 3:20 AM, Thomas Munro wrote:
> > > I'm not really sure what to do about achive restore scripts that
> > > block. That seems to be fundamentally incompatible with what I'm
> > > doing here.
> >
> > IMHO we can't do much about that, except for documenting it - if the
> > prefetch can't work because of blocking restore script, someone has to
> > fix/improve the script. No way around that, I'm afraid.
>
> I'm a bit confused about what the issue here is- is the concern that a
> restore_command is specified that isn't allowed to run concurrently but
> this patch is intending to run more than one concurrently..? There's
> another patch that I was looking at for doing pre-fetching of WAL
> segments, so if this is also doing that we should figure out which
> patch we want..
The problem is that the recovery loop tries to look further ahead in
between applying individual records, which causes the restore script
to run, and if that blocks, we won't apply records that we already
have, because we're waiting for the next WAL file to appear. This
behaviour is on by default with my patch, so pg_standby will introduce
a weird replay delays. We could think of some ways to fix that, with
meaningful return codes and periodic polling or something, I suppose,
but something feels a bit weird about it.
> I don't know that it's needed, but it feels likely that we could provide
> a better result if we consider making changes to the restore_command API
> (eg: have a way to say "please fetch this many segments ahead, and you
> can put them in this directory with these filenames" or something). I
> would think we'd be able to continue supporting the existing API and
> accept that it might not be as performant.
Hmm. Every time I try to think of a protocol change for the
restore_command API that would be acceptable, I go around the same
circle of thoughts about event flow and realise that what we really
need for this is ... a WAL receiver...
Here's a rebase over the recent commit "Get rid of the dedicated latch
for signaling the startup process." just to fix cfbot; no other
changes.
Attachment | Content-Type | Size |
---|---|---|
v14-0001-Add-pg_atomic_unlocked_add_fetch_XXX.patch | text/x-patch | 3.4 KB |
v14-0002-Improve-information-about-received-WAL.patch | text/x-patch | 7.8 KB |
v14-0003-Provide-XLogReadAhead-to-decode-future-WAL-recor.patch | text/x-patch | 60.0 KB |
v14-0004-Prefetch-referenced-blocks-during-recovery.patch | text/x-patch | 64.1 KB |
v14-0005-WIP-Avoid-extra-buffer-lookup-when-prefetching-W.patch | text/x-patch | 10.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2020-11-18 05:58:21 | Re: pl/pgsql feature request: shorthand for argument and local variable references |
Previous Message | Michael Paquier | 2020-11-18 05:06:56 | Re: Tab complete for CREATE OR REPLACE TRIGGER statement |