From: | Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, simon(dot)riggs(at)enterprisedb(dot)com, Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com> |
Subject: | Re: repeated decoding of prepared transactions |
Date: | 2021-02-20 10:55:19 |
Message-ID: | 21251661-f342-a2e1-05bc-77945d476562@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20.02.21 04:38, Amit Kapila wrote:
> I see a problem with this assumption. During the initial
> synchronization, this transaction won't be visible to snapshot and we
> won't copy it. Then later if we won't decode and send it then the
> replica will be out of sync. Such a problem won't happen with Ajin's
> patch.
You are assuming that the initial snapshot is a) logical and b) dumb.
A physical snapshot very well "sees" prepared transactions and will
restore them to their prepared state. But even in the logical case, I
think it's beneficial to keep the decoder simpler and instead require
some support for two-phase commit in the initial synchronization logic.
For example using the following approach (you will recognize
similarities to what snapbuild does):
1.) create the slot
2.) start to retrieve changes and queue them
3.) wait for the prepared transactions that were pending at the
point in time of step 1 to complete
4.) take a snapshot (by visibility, w/o requiring to "see" prepared
transactions)
5.) apply the snapshot
6.) replay the queue, filtering commits already visible in the
snapshot
Just as with the solution proposed by Ajin and you, this has the danger
of showing transactions as committed without the effects of the PREPAREs
being "visible" (after step 5 but before 6).
However, this approach of solving the problem outside of the walsender
has two advantages:
* The delay in step 3 can be made visible and dealt with. As there's
no upper boundary to that delay, it makes sense to e.g. inform the
user after 10 minutes and provide a list of two-phase transactions
still in progress.
* Second, it becomes possible to avoid inconsistencies during the
reconciliation window in between steps 5 and 6 by disallowing
concurrent (user) transactions to run until after completion of
step 6.
Whereas the current implementation hides this in the walsender without
any way to determine how much a PREPARE had been delayed or when
consistency has been reached. (Of course, short of using the very same
initial snapshotting approach outlined above. For which the reordering
logic in the walsender does more harm than good.)
Essentially, I think I'm saying that while I agree that some kind of
snapshot synchronization logic is needed, it should live in a different
place.
Regards
Markus
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-02-20 11:15:39 | Re: [PATCH] Present all committed transaction to the output plugin |
Previous Message | Dilip Kumar | 2021-02-20 10:46:48 | Re: [HACKERS] Custom compression methods |