From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | stopgap fix for signal handling during restore_command |
Date: | 2023-02-23 23:15:03 |
Message-ID: | 20230223231503.GA743455@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote:
> On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> I'm happy to create a new thread if needed, but I can't tell if there is
>> any interest in this stopgap/back-branch fix. Perhaps we should just jump
>> straight to the long-term fix that Thomas is looking into.
>
> Unfortunately the latch-friendly subprocess module proposal I was
> talking about would be for 17. I may post a thread fairly soon with
> design ideas + list of problems and decision points as I see them, and
> hopefully some sketch code, but it won't be a proposal for [/me checks
> calendar] next week's commitfest and probably wouldn't be appropriate
> in a final commitfest anyway, and I also have some other existing
> stuff to clear first. So please do continue with the stopgap ideas.
Okay, here is a new thread...
Since v8.4, the startup process will proc_exit() immediately within its
SIGTERM handler while the restore_command executes via system(). Some
recent changes added unsafe code to the section where this behavior is
enabled [0]. The long-term fix likely includes moving away from system()
completely, but we may want to have a stopgap/back-branch fix while that is
under development.
I've attached a patch set for a proposed stopgap fix. 0001 simply moves
the extra code outside of the Pre/PostRestoreCommand() block so that only
system() is executed while the SIGTERM handler might proc_exit(). This
restores the behavior that was in place from v8.4 to v14, so I don't expect
it to be too controversial. 0002 adds code to startup's SIGTERM handler to
call _exit() instead of proc_exit() if we are in a forked process from
system(), etc. It also adds assertions to ensure proc_exit(), ProcKill(),
and AuxiliaryProcKill() are not called within such forked processes.
Thoughts?
[0] https://postgr.es/m/20230201105514.rsjl4bnhb65giyvo%40alap3.anarazel.de
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Move-extra-code-out-of-the-Pre-PostRestoreCommand.patch | text/x-diff | 2.0 KB |
v6-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-f.patch | text/x-diff | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-02-23 23:16:50 | Re: Weird failure with latches in curculio on v15 |
Previous Message | Andrey Chudnovsky | 2023-02-23 23:04:22 | Re: [PoC] Federated Authn/z with OAUTHBEARER |