RE: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "ikedamsh(at)oss(dot)nttdata(dot)com" <ikedamsh(at)oss(dot)nttdata(dot)com>, "zyu(at)yugabyte(dot)com" <zyu(at)yugabyte(dot)com>, "ibrar(dot)ahmad(at)gmail(dot)com" <ibrar(dot)ahmad(at)gmail(dot)com>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "masahiko(dot)sawada(at)2ndquadrant(dot)com" <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-06-10 07:08:37
Message-ID: TYAPR01MB29905116075A10713D01AE30FE359@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
> If we accept each elementary-commit (via FDW connection) to fail, the
> parent(?) there's no way the root 2pc-commit can succeed. How can we
> ignore the fdw-error in that case?

No, we don't ignore the error during FDW commit. As mentioned at the end of this mail, the question is how the FDW reports the eror to the caller (transaction manager in Postgres core), and how we should handle it.

As below, Glassfish catches the resource manager's error during commit, retries the commit if the error is transient or communication failure, and hands off the processing of failed commit to the recovery manager. (I used all of my energy today; I'd be grateful if someone could figure out whether Glassfish reports the error to the application.)

[XATerminatorImpl.java]
public void commit(Xid xid, boolean onePhase) throws XAException {
...
} else {
coord.commit();
}

[TopCoordinator.java]
// Commit all participants. If a fatal error occurs during
// this method, then the process must be ended with a fatal error.
...
try {
participants.distributeCommit();
} catch (Throwable exc) {

[RegisteredResources.java]
void distributeCommit() throws HeuristicMixed, HeuristicHazard, NotPrepared {
...
// Browse through the participants, committing them. The following is
// intended to be done asynchronously as a group of operations.
...
// Tell the resource to commit.
// Catch any exceptions here; keep going until
// no exception is left.
...
// If the exception is neither TRANSIENT or
// COMM_FAILURE, it is unexpected, so display a
// message and give up with this Resource.
...
// For TRANSIENT or COMM_FAILURE, wait
// for a while, then retry the commit.
...
// If the retry limit has been exceeded,
// end the process with a fatal error.
...
if (!transactionCompleted) {
if (coord != null)
RecoveryManager.addToIncompleTx(coord, true);

> > No. Taking the description literally and considering the relevant XA
> specification, it's not about the remote commit failure. The remote server is
> not allowed to fail the commit once it has reported successful prepare, which is
> the contract of 2PC. HeuristicMixedException is about the manual resolution,
> typically by the DBA, using the DBMS-specific tool or the standard
> commit()/rollback() API.
>
> Mmm. The above seems as if saying that 2pc-comit does not interact
> with remotes. The interface contract does not cover everything that
> happens in the real world. If remote-commit fails, that is just an
> issue outside of the 2pc world. In reality remote-commit may fail for
> all reasons.

The following part of XA specification is relevant. We're considering to model the FDW 2PC interface based on XA, because it seems like the only standard interface and thus other FDWS would naturally take advantage of, aren't we? Then, we need to take care of such things as this. The interface design is not easy. So, proper design and its review should come first, before going deeper into the huge code patch.

2.3.3 Heuristic Branch Completion
--------------------------------------------------
Some RMs may employ heuristic decision-making: an RM that has prepared to
commit a transaction branch may decide to commit or roll back its work independently
of the TM. It could then unlock shared resources. This may leave them in an
inconsistent state. When the TM ultimately directs an RM to complete the branch, the
RM may respond that it has already done so. The RM reports whether it committed
the branch, rolled it back, or completed it with mixed results (committed some work
and rolled back other work).

An RM that reports heuristic completion to the TM must not discard its knowledge of
the transaction branch. The TM calls the RM once more to authorise it to forget the
branch. This requirement means that the RM must notify the TM of all heuristic
decisions, even those that match the decision the TM requested. The referenced
OSI DTP specifications (model) and (service) define heuristics more precisely.
--------------------------------------------------

> https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distr
> ibuted-transaction-that-uses-jta-methods
> This suggests that both XAResoruce.prepare() and commit() can throw a
> exception.

Yes, XAResource() can throw an exception:

void commit(Xid xid, boolean onePhase) throws XAException

Throws: XAException
An error has occurred. Possible XAExceptions are XA_HEURHAZ, XA_HEURCOM,
XA_HEURRB, XA_HEURMIX, XAER_RMERR, XAER_RMFAIL, XAER_NOTA,
XAER_INVAL, or XAER_PROTO.

This is equivalent to xa_commit() in the XA specification. xa_commit() can return an error code that have the same names as above.

The question we're trying to answer here is:

* How such an error should be handled?
Glassfish (and possibly other Java EE servers) catch the error, continue to commit the rest of participants, and handle the failed resource manager's commit in the background. In Postgres, if we allow FDWs to do ereport(ERROR), how can we do similar things?

* Should we report the error to the client? If yes, should it be reported as a failure of commit, or as an informational message (WARNING) of a successful commit? Why does the client want to know the error, where the global transaction's commit has been promised?

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-06-10 07:26:40 "an SQL" vs. "a SQL"
Previous Message Amit Kapila 2021-06-10 06:48:00 Re: Logical replication keepalive flood