From: | Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> |
---|---|
To: | Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Subject: | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date: | 2020-07-17 06:55:32 |
Message-ID: | CA+fd4k6J2nL7OfM50iknrdavJPHS8GLb85rFhKSuFE-KkLcR0w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2020-07-16 13:16, Masahiko Sawada wrote:
> > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
> > wrote:
> >>
> >> > I've attached the latest version patches. I've incorporated the review
> >> > comments I got so far and improved locking strategy.
> >>
> >> I want to ask a question about streaming replication with 2PC.
> >> Are you going to support 2PC with streaming replication?
> >>
> >> I tried streaming replication using v23 patches.
> >> I confirm that 2PC works with streaming replication,
> >> which there are primary/standby coordinator.
> >>
> >> But, in my understanding, the WAL of "PREPARE" and
> >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
> >> sync.
> >>
> >> If this is right, the unresolved transaction can be occurred.
> >>
> >> For example,
> >>
> >> 1. PREPARE is done
> >> 2. crash primary before the WAL related to PREPARE is
> >> replicated to the standby server
> >> 3. promote standby server // but can't execute "ABORT PREPARED"
> >>
> >> In above case, the remote server has the unresolved transaction.
> >> Can we solve this problem to support in-sync replication?
> >>
> >> But, I think some users use async replication for performance.
> >> Do we need to document the limitation or make another solution?
> >>
> >
> > IIUC with synchronous replication, we can guarantee that WAL records
> > are written on both primary and replicas when the client got an
> > acknowledgment of commit. We don't replicate each WAL records
> > generated during transaction one by one in sync. In the case you
> > described, the client will get an error due to the server crash.
> > Therefore I think the user cannot expect WAL records generated so far
> > has been replicated. The same issue could happen also when the user
> > executes PREPARE TRANSACTION and the server crashes.
>
> Thanks! I didn't noticed the behavior when a user executes PREPARE
> TRANSACTION is same.
>
> IIUC with 2PC, there is a different point between (1)PREPARE TRANSACTION
> and (2)2PC.
> The point is that whether the client can know when the server crashed
> and it's global tx id.
>
> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
> command
> because if the remote server is already prepared the command will be
> ignored.
>
> But, if (2)2PC is failed with coordinator crash, the client can't know
> what operations should be done.
>
> If the old coordinator already executed PREPARED, there are some
> transaction which should be ABORT PREPARED.
> But if the PREPARED WAL is not sent to the standby, the new coordinator
> can't execute ABORT PREPARED.
> And the client can't know which remote servers have PREPARED
> transactions which should be ABORTED either.
>
> Even if the client can know that, only the old coordinator knows its
> global transaction id.
> Only the database administrator can analyze the old coordinator's log
> and then execute the appropriate commands manually, right?
I think that's right. In the case of the coordinator crash, the user
can look orphaned foreign prepared transactions by checking the
'identifier' column of pg_foreign_xacts on the new standby server and
the prepared transactions on the remote servers.
>
>
> > To prevent this
> > issue, I think we would need to send each WAL records in sync but I'm
> > not sure it's reasonable behavior, and as long as we write WAL in the
> > local and then send it to replicas we would need a smart mechanism to
> > prevent this situation.
>
> I agree. To send each 2PC WAL records in sync must be with a large
> performance impact.
> At least, we need to document the limitation and how to handle this
> situation.
Ok. I'll add it.
>
>
> > Related to the pointing out by Ikeda-san, I realized that with the
> > current patch the backend waits for synchronous replication and then
> > waits for foreign transaction resolution. But it should be reversed.
> > Otherwise, it could lead to data loss even when the client got an
> > acknowledgment of commit. Also, when the user is using both atomic
> > commit and synchronous replication and wants to cancel waiting, he/she
> > will need to press ctl-c twice with the current patch, which also
> > should be fixed.
>
> I'm sorry that I can't understood.
>
> In my understanding, if COMMIT WAL is replicated to the standby in sync,
> the standby server can resolve the transaction after crash recovery in
> promoted phase.
>
> If reversed, there are some situation which can't guarantee atomic
> commit.
> In case that some foreign transaction resolutions are succeed but others
> are failed(and COMMIT WAL is not replicated),
> the standby must ABORT PREPARED because the COMMIT WAL is not
> replicated.
> This means that some foreign transactions are COMMITE PREPARED executed
> by primary coordinator,
> other foreign transactions can be ABORT PREPARED executed by secondary
> coordinator.
You're right. Thank you for pointing out!
If the coordinator crashes after the client gets acknowledgment of the
successful commit of the transaction but before sending
XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are
left on the replicas even after failover. But since we require FDW to
tolerate the error of undefined prepared transactions in
COMMIT/ROLLBACK PREPARED it won’t be a critical problem.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-07-17 07:25:58 | Re: Is it useful to record whether plans are generic or custom? |
Previous Message | Laurenz Albe | 2020-07-17 06:49:08 | Re: Transactions involving multiple postgres foreign servers, take 2 |