Re: libpqrcv_PQexec() seems to violate latch protocol

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: libpqrcv_PQexec() seems to violate latch protocol
Date: 2017-06-06 21:51:37
Message-ID: cd0cd76d-8057-529c-831b-903b1474435b@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/06/17 23:42, Andres Freund wrote:
> On 2017-06-06 23:24:50 +0200, Petr Jelinek wrote:
>> On 06/06/17 23:17, Andres Freund wrote:
>>> Right. I found a couple more instance of similarly iffy, although not
>>> quite as broken, patterns in launcher.c. It's easy to get this wrong,
>>> but it's a lot easy if you do it differently everywhere you use a
>>> latch. It's not good if code in the same file, by the same author(s),
>>> has different ways of using latches.
>>
>> Huh? I see same pattern everywhere in launcher.c, what am I missing?
>
> WaitForReplicationWorkerAttach:
> while (...)
> CHECK_FOR_INTERRUPTS();
> /* other stuff including returns */
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
>
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff */
> CHECK_FOR_INTERRUPTS()
> WaitLatch()
> POSTMASTER_DEATH
> ResetLatch()
> /* other stuff including returns */
> logicalrep_worker_stop loop 1:
> while (...)
> /* other stuff including returns */
> CHECK_FOR_INTERRUPTS();
> WaitLatch()
> WL_POSTMASTER_DEATH
> ResetLatch()
>
> ApplyLauncherMain:
> while (!got_SIGTERM)
> /* lots other stuff */
> WaitLatch()
> WL_POSTMASTER_DEATH
> /* some other stuff */
> ResetLatch()
> (note no CFI)
>
> they're not hugely different, but subtely there are differences.
> Sometimes you're guaranteed to check for interrupts after resetting the
> latch, in other cases not. Sometimes expensive-ish things happen before
> a CFI...
>

Ah that's because signals in launcher are broken, see
https://www.postgresql.org/message-id/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
which also includes patch to fix it.

We originally had custom signal handling everywhere, then I realized it
was mistake for workers because of locking interaction but missed same
issue with launcher (the CFI in current launcher doesn't work).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-06-06 22:47:53 Re: PG10 transition tables, wCTEs and multiple operations on the same table
Previous Message Mike Palmiotto 2017-06-06 21:44:16 Re: BUG #14682: row level security not work with partitioned table