Re: Sending SIGABRT to child processes (was Re: Strange failure on mamba)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sending SIGABRT to child processes (was Re: Strange failure on mamba)
Date: 2022-11-21 17:04:21
Message-ID: 3212779.1669050261@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> I suspect that having a GUC would be a good idea. I needed something similar
>> recently, debugging an occasional hang in the AIO patchset. I first tried
>> something like your #define approach and it did cause a problematic flood of
>> core files.

> Yeah, the main downside of such a thing is the risk of lots of core files
> accumulating over repeated crashes. Nonetheless, I think it'll be a
> useful debugging aid. Here's a proposed patch. (I took the opportunity
> to kill off the long-since-unimplemented Reinit switch, too.)

Hearing no complaints, I've pushed this and reconfigured mamba to use
send_abort_for_kill. Once I've got a core file or two to look at,
I'll try to figure out what's going on there.

> One thing I'm not too clear on is if we want to send SIGABRT to the child
> groups (ie, SIGABRT grandchild processes too). I made signal_child do
> so here, but perhaps it's overkill.

After further thought, we do have to SIGABRT the grandchildren too,
or they won't shut down promptly. I think there might be a small
risk of some programs trapping SIGABRT and doing something other than
what we want; but since this is only a debug aid that's probably
tolerable.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-11-21 17:07:45 Re: when the startup process doesn't (logging startup delays)
Previous Message sirisha chamarthi 2022-11-21 16:49:08 Re: Catalog_xmin is not advanced when a logical slot is lost