From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Broken order-of-operations in parallel query latch manipulation |
Date: | 2016-08-01 04:07:25 |
Message-ID: | CAA4eK1J8cP0ceOBZodcbBfkPz7zhM28-OwJH+pyeYHjsaznefQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 1, 2016 at 1:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Both shm_mq.c and nodeGather.c contain instances of this coding pattern:
>
> WaitLatch(MyLatch, WL_LATCH_SET, 0);
> CHECK_FOR_INTERRUPTS();
> ResetLatch(MyLatch);
>
> I believe this is wrong and the CHECK_FOR_INTERRUPTS needs to be before
> or after the two latch operations. As-is, if the reason somebody set
> our latch was to get us to notice that a CHECK_FOR_INTERRUPTS condition
> happened, there's a race condition where we'd fail to realize that.
>
I could see that in nodeGather.c, it might fail to notice the SetLatch
done by worker process or spuriously woken up due to SetLatch for some
unrelated reason. However, I don't see what problem it can cause
apart from one extra loop cycle where it will try to process the tuple
when actually there is no tuple in the queue.
> Other places such as ProcWaitForSignal() do it that way; only recently
> introduced (and unproven in the field) code has it like this.
>
> Anyone want to argue it's okay as-is?
>
I don't want to argue to the change to make it same as what we do at
other places, but want to understand the problem you are seeing with
current coding pattern in nodeGather.c
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Khandekar | 2016-08-01 05:14:56 | Re: asynchronous and vectorized execution |
Previous Message | Thomas Munro | 2016-08-01 00:52:06 | Re: Combining hash values |