From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> |
Cc: | Michael Harris <harmic(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Undetected Deadlock |
Date: | 2022-02-03 16:36:27 |
Message-ID: | 272367.1643906187@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> writes:
> On Thu, 3 Feb 2022 at 06:25, Michael Harris <harmic(at)gmail(dot)com> wrote:
>> Some of these functions trigger fetching of remote resources, for
>> which a timeout is set using `alarm`. The function unfortunately does
>> not re-establish any pre-existing interval timers after it is done,
>> which leads to postgresql missing it's own expected alarm signal.
>>
>> The reason that this was not affecting us on previous postgres
>> versions was this commit:
>>
>> https://github.com/postgres/postgres/commit/09cf1d52267644cdbdb734294012cf1228745aaa#diff-b12a7ca3bf9c6a56745844c2670b0b28d2a4237741c395dda318c6cc3664ad4a
>>
>> After this commit, once an alarm is missed, that backend never sets
>> one again, so no timeouts of any kind will work. Therefore, the
>> deadlock detector was never being run. Prior to that, the next time
>> any timeout was set by the backend it would re-establish it's timer.
>>
>> We will of course fix our own code to prevent this issue, but I am a
>> little concerned at the above commit as it reduces the robustness of
>> postgres in this situation. Perhaps I will raise it on the
>> pgsql-hackers list.
> Hmm, so you turned off Postgres' alarms so they stopped working, and
> you're saying that is a robustness issue of Postgres?
If Michael's analysis were accurate, I'd agree that there is a robustness
issue, but I don't think there is. See timeout.c:220:
/*
* Get the time remaining till the nearest pending timeout. If it is
* negative, assume that we somehow missed an interrupt, and force
* signal_pending off. This gives us a chance to recover if the
* kernel drops a timeout request for some reason.
*/
nearest_timeout = active_timeouts[0]->fin_time;
if (now > nearest_timeout)
{
signal_pending = false;
/* force an interrupt as soon as possible */
secs = 0;
usecs = 1;
}
Now admittedly we don't have a good way to test this stanza, but
it should result in re-establishing the timer interrupt the next
time any timeout.c API is invoked after a missed interrupt.
I don't see anything more that we could or should do. We're
not going to issue setitimer() after every user-defined function
call.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Vijaykumar Jain | 2022-02-03 18:04:46 | Re: Subscription stuck at initialize state |
Previous Message | Shaozhong SHI | 2022-02-03 15:32:15 | Re: Can Postgres beat Oracle for regexp_count? |