We shouldn't signal process groups with SIGQUIT

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: We shouldn't signal process groups with SIGQUIT
Date: 2023-02-14 20:29:27
Message-ID: 20230214202927.xgb2w6b7gnhq6tvv@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

The default reaction to SIGQUIT is to create core dumps. We use SIGQUIT to
implement immediate shutdowns. We send the signal to the entire process group.

The result of that is that we regularly produce core dumps for binaries like
sh/cp. I regularly see this on my local system, I've seen it on CI. Recently
Thomas added logic to show core dumps happing in cfbot ([1]). Plenty unrelated
core dumps, but also lots in sh/cp ([2]).

We found a bunch of issues as part of [3], but I think the issue I'm
discussing here is separate.

ISTM that signal_child() should downgrade SIGQUIT to SIGTERM when sending to
the process group. That way we'd maintain the current behaviour for postgres
itself, but stop core-dumping archive/restore scripts (as well as other
subprocesses that e.g. trusted PLs might create).

Makes sense?

Greetings,

Andres Freund

[1] http://cfbot.cputube.org/highlights/core.html

[2] A small sample:
https://api.cirrus-ci.com/v1/task/5939902693507072/logs/cores.log
https://api.cirrus-ci.com/v1/task/5549174150660096/logs/cores.log
https://api.cirrus-ci.com/v1/task/6153817767542784/logs/cores.log
https://api.cirrus-ci.com/v1/task/6567335205535744/logs/cores.log
https://api.cirrus-ci.com/v1/task/4804998119292928/logs/cores.log

[3] https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-14 20:38:24 Re: We shouldn't signal process groups with SIGQUIT
Previous Message Tom Lane 2023-02-14 20:21:45 Re: Possible false valgrind error reports