From: | Joachim Wieland <joe(at)mcknight(dot)de> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Listen / Notify - what to do when the queue is full |
Date: | 2009-11-19 22:04:07 |
Message-ID: | dc7b844e0911191404p608c789o4b8a211326750066@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 19, 2009 at 6:55 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Hmm, ignoring 2PC for a moment, I think the patch suffers from a little
> race condition:
>
> Session 1: BEGIN;
> Session 1: INSERT INTO foo ..;
> Session 1: NOTIFY 'foo';
> Session 1: COMMIT -- commit begins
> Session 1: [commit processing runs AtCommit_NotifyBeforeCommit()]
----> Session 2 must not read uncommited notifications selectively
> Session 2: LISTEN 'foo';
> Session 2: SELECT * FROM foo;
> Session 1: [AtCommit_NotifyAfterCommit() signals listening backends]
> Session 2: [waits for notifications]
>
> Because session 2 began listening after session 1 had already sent its
> notifications, it missed them.
I think you are right. However note that session 1 does not actively
send notifications to anybody, it just puts them into the queue. It's
every backend's own job to process the queue and see which messages
are interesting and which are not. The example you brought up fails if
Session 2 disregards the notifications based on the current set of
channels that it is listening to at this point. If I understand you
correctly what you are suggesting is to not read uncommitted
notifications from the queue in a reading backend or read all
notifications (regardless of which channel it has been sent to), such
that the backend can apply the check ("Am i listening on this
channel?") later on.
> I think we could fix that by arranging things so that a backend refrains
> from advancing its own 'pos' beyond the first notification it has
> written itself, until commit is completely finished.
In the end this is similar to the idea to not read uncommitted
notifications which was what I did at the beginning. However then you
run into a full queue a lot faster. Imagine a queue length of 1000
with 3 transactions writing 400 notifications each... All three might
fail if they run in parallel, even though space would be sufficient
for at least two of them, and if they are executed in a sequence, all
of them could deliver their notifications.
Given your example, what I am proposing now is to stop reading from
the queue once we see a not-yet-committed notification but once the
queue is full, read the uncommitted notifications, effectively copying
them over into the backend's own memory... Once the transaction
commits and sends a signal, we can process, send and discard the
previously copied notifications. In the above example, at some point
one, two or all three backends would see that the queue is full and
everybody would read the uncommitted notifications of the other one,
copy them into the own memory and space will be freed in the queue.
> That will handle 2PC as well. We can send the notifications in
> prepare-phase, and any LISTEN that starts after the prepare-phase will
> see the notifications because they're still in the queue. There is no
> risk of running out of disk space in COMMIT PREPARED, because the
> notifications have already been written to disk. However, the
> notification queue can't be truncated until the prepared transaction
> finishes; does anyone think that's a show-stopper?
Note that we don't preserve notifications when the database restarts.
But 2PC can cope with restarts. How would that fit together? Also I am
not sure how you are going to deliver notifications that happen
between the PREPARE TRANSACTION and the COMMIT PREPARED (because you
have only one queue pointer which you are not going to advance...) ?
Joachim
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2009-11-19 22:05:16 | Re: Patch to change a pg_restore message |
Previous Message | Peter Eisentraut | 2009-11-19 22:00:24 | Re: PL/Python array support |