From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds) |
Date: | 2021-03-01 11:00:08 |
Message-ID: | CA+hUKGJXYSazAYmBN4_BO8A5YiL9doT2VZr3ELaCP0RYwXzuqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 16, 2020 at 8:56 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Thu, Sep 24, 2020 at 05:55:17PM +1200, Thomas Munro wrote:
> > Right, RestoreArchivedFile() uses system(), so I guess it can hang
> > around for a long time after unexpected postmaster exit on every OS if
> > the command waits. To respond to various kinds of important
> > interrupts, I suppose that'd ideally use something like
> > OpenPipeStream() and a typical WaitLatch() loop with CFI(). I'm not
> > sure what our policy is or should be for exiting while we have running
> > subprocesses. I guess that is a separate issue.
>
> - if (IsUnderPostmaster && !PostmasterIsAlive())
> + if (IsUnderPostmaster &&
> +#ifndef USE_POSTMASTER_DEATH_SIGNAL
> + count++ % 1024 == 0 &&
> +#endif
> + !PostmasterIsAlive())
> That's pretty hack-ish, still efficient. Could we consider a
> different approach like something relying on
> PostmasterIsAliveInternal() with repetitive call handling? This may
> not be the only place where we care about that, particularly for
> non-core code.
As far as I know there aren't any other places that do polling of
PostmasterIsAlive() in a loop like this. All others have been removed
from core code: either they already had a WaitLatch() or similar so it
we just had to add WL_EXIT_ON_PM_DEATH, or they do pure CPU-bound and
don't actively try to detect postmaster death. That's why it seems
utterly insane that here we try to detect it X million times per
second.
> No objections with the two changes from pg_usleep() to WaitLatch() so
> they could be applied separately first.
I thought about committing that first part, and got as far as
splitting the patch into two (see attached), but then I re-read
Fujii-san's message about the speed of promotion and realised that we
really should have something like a condition variable for walRcvState
changes. I'll look into that.
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Replace-some-sleep-poll-loops-with-WaitLatch.patch | text/x-patch | 4.3 KB |
v5-0002-Poll-postmaster-less-frequently-in-recovery.patch | text/x-patch | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2021-03-01 11:13:56 | Re: Regex back-reference semantics and performance |
Previous Message | Michael Banck | 2021-03-01 10:12:49 | [PATCH] Add --create-only option to pg_dump/pg_dumpall |