From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net> |
Subject: | Re: libpqrcv_PQexec() seems to violate latch protocol |
Date: | 2017-06-06 21:51:37 |
Message-ID: | cd0cd76d-8057-529c-831b-903b1474435b@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 06/06/17 23:42, Andres Freund wrote:
> On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:
>> On 06/06/17 23:17, Andres Freund wrote:
>>> Right. I found a couple more instance of similarly iffy, although not
>>> quite as broken, patterns in launcher.c. It's easy to get this wrong,
>>> but it's a lot easy if you do it differently everywhere you use a
>>> latch. It's not good if code in the same file, by the same author(s),
>>> has different ways of using latches.
>>
>> Huh? I see same pattern everywhere in launcher.c, what am I missing?
>
> WaitForReplicationWorkerAttach:
> while (...)
> CHECK_FOR_INTERRUPTS();
> /* other stuff including returns */
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
>
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff */
> CHECK_FOR_INTERRUPTS()
> WaitLatch()
> POSTMASTER_DEATH
> ResetLatch()
> /* other stuff including returns */
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff including returns */
> CHECK_FOR_INTERRUPTS();
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
>
> ApplyLauncherMain:
> while (!got_SIGTERM)
> /* lots other stuff */
> WaitLatch()
> WL_POSTMASTER_DEATH
> /* some other stuff */
> ResetLatch()
> (note no CFI)
>
> they're not hugely different, but subtely there are differences.
> Sometimes you're guaranteed to check for interrupts after resetting the
> latch, in other cases not. Sometimes expensive-ish things happen before
> a CFI...
>
Ah that's because signals in launcher are broken, see
https://www.postgresql.org/message-id/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
which also includes patch to fix it.
We originally had custom signal handling everywhere, then I realized it
was mistake for workers because of locking interaction but missed same
issue with launcher (the CFI in current launcher doesn't work).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2017-06-06 22:47:53 | Re: PG10 transition tables, wCTEs and multiple operations on the same table |
Previous Message | Mike Palmiotto | 2017-06-06 21:44:16 | Re: BUG #14682: row level security not work with partitioned table |