From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> |
Subject: | Re: Parallel query hangs after a smart shutdown is issued |
Date: | 2020-08-12 19:28:26 |
Message-ID: | 296192.1597260506@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Thu, Aug 13, 2020 at 6:00 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> One other thing I changed here was to remove PM_WAIT_READONLY from the
>> set of states in which we'll allow promotion to occur or a new walreceiver
>> to start. I'm not convinced that either of those behaviors aren't
>> bugs; although if someone thinks they're right, we can certainly put
>> back PM_WAIT_CLIENTS in those checks. (But, for example, it does not
>> appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
>> Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
>> like confusingly dead code to me. If we do want to allow restarting
>> the walreceiver in this state, the Shutdown condition needs fixed.)
> If a walreceiver is allowed to run, why should it not be allowed to
> restart?
I'd come to about the same conclusion after thinking more, so v2
attached undoes that change. I think putting off promotion is fine
though; it'll get handled at the next postmaster start. (It looks
like the state machine would just proceed to exit anyway if we allowed
the promotion, but that's a hard-to-test state transition that we could
do without.)
> Yeah, I suppose that other test'd need to be Shutdown <=
> SmartShutdown, just like we do in SIGHUP_handler(). Looking at other
> places where we test Shutdown == NoShutdown, one that jumps out is the
> autovacuum wraparound defence stuff and the nearby
> PMSIGNAL_START_AUTOVAC_WORKER code.
Oh, excellent point! I'd not thought to look at tests of the Shutdown
variable, but yeah, those should be <= SmartShutdown if we want autovac
to continue to operate in this state.
I also noticed that where reaper() is dealing with startup process
exit(3), it unconditionally sets Shutdown = SmartShutdown which seems
pretty bogus; that variable's value should never be allowed to decrease,
but this could cause it. In the attached I did
StartupStatus = STARTUP_NOT_RUNNING;
- Shutdown = SmartShutdown;
+ Shutdown = Max(Shutdown, SmartShutdown);
TerminateChildren(SIGTERM);
But given that it's forcing immediate termination of all backends,
I wonder if that's not more like a FastShutdown? (Scary here is
that the coverage report shows we're not testing this path, so who
knows if it works at all.)
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
keep-bg-processes-alive-in-smart-shutdown-2.patch | text/x-diff | 9.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-08-12 19:54:40 | Re: use pg_get_functiondef() in pg_dump |
Previous Message | Thomas Munro | 2020-08-12 18:54:38 | Re: Parallel query hangs after a smart shutdown is issued |