From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Gustavsson Mikael <mikael(dot)gustavsson(at)smhi(dot)se>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Svensson Peter <peter(dot)svensson(at)smhi(dot)se>, Almen Anders <anders(dot)almen(at)smhi(dot)se> |
Subject: | Re: SV: Problem with pg_notify / listen |
Date: | 2020-11-27 22:13:28 |
Message-ID: | 730247.1606515208@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-general |
I wrote:
> * Change the code back to being atomic, ie go ahead and update
> QUEUE_TAIL immediately, and truncate the SLRU only afterward.
> Why is it necessary, or even safe, to perform the SLRU truncation
> before changing QUEUE_TAIL? (IOW, I wonder if there isn't a bug
> there in HEAD too.)
After thinking more about that, I'm pretty sure there is a bug there:
a newly-arriving backend could attempt to scan the queue starting at
QUEUE_TAIL, concurrently with SimpleLruTruncate removing the page(s)
it wants to scan. In typical cases no failure will occur because
Exec_ListenPreCommit could advance its queue pointer to a safe place
by observing the pointers of other backends. However, if the new
listener were the only one in its database, we'd have trouble.
What seems like the most expedient way to fix this is to separate
QUEUE_TAIL into two variables, one that is the "logical" tail
position from which new backends may start to scan the queue,
and one that is the "physical" tail position, ie, the oldest
page we have not yet truncated. The physical tail need only be
tracked to page accuracy, so it can be a plain int not a QUEUE_POS.
asyncQueueAdvanceTail should update QUEUE_TAIL immediately, which
restores correct behavior pre-v13 and also dodges the race condition
described above. But we don't update the physical tail pointer
until we've completed SimpleLruTruncate, keeping things honest
with respect to asyncQueueIsFull.
As a minor side benefit, asyncQueueAdvanceTail doesn't have to take
the NotifyQueueLock twice unless it actually does an SLRU truncation.
In short, I propose the attached patch (against HEAD, but the
same logic changes should work in the back branches).
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-notify-race-conditions.patch | text/x-diff | 4.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2020-11-28 03:59:06 | Re: SV: Problem with pg_notify / listen |
Previous Message | Tom Lane | 2020-11-27 19:07:42 | Re: BUG #16749: EXPLAIN only shows filtering from the first query in a union chain when using distinct on. |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2020-11-28 03:59:06 | Re: SV: Problem with pg_notify / listen |
Previous Message | David G. Johnston | 2020-11-27 21:25:25 | Re: postgres_fdw insert extremely slow |