Re: Transactions involving multiple postgres foreign servers

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2015-01-11 08:36:15
Message-ID: CAB7nPqRAWThjLj1_TkcLpuO-QNGvabBmHckw+Q7pX0wFpTH4AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 11, 2015 at 10:37 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 1/10/15, 7:11 AM, Michael Paquier wrote:
>>>
>>> If we had an independent transaction coordinator then I agree with you
>>> >Kevin. I think Robert is proposing that if we are controlling one of the
>>> >nodes that's participating as well as coordinating the overall
>>> > transaction
>>> >that we can take some shortcuts. AIUI a PREPARE means you are completely
>>> >ready to commit. In essence you're just waiting to write and fsync the
>>> >commit message. That is in fact the state that a coordinating PG node
>>> > would
>>> >be in by the time everyone else has done their prepare. So from that
>>> >standpoint we're OK.
>>> >
>>> >Now, as soon as ANY of the nodes commit, our coordinating node MUST be
>>> > able
>>> >to commit as well! That would require it to have a real prepared
>>> > transaction
>>> >of it's own created. However, as long as there is zero chance of any
>>> > other
>>> >prepared transactions committing before our local transaction, that step
>>> >isn't actually needed. Our local transaction will either commit or
>>> > abort,
>>> >and that will determine what needs to happen on all other nodes.
>>
>> It is a property of 2PC to ensure that a prepared transaction will
>> commit. Now, once it is confirmed on the coordinator that all the
>> remote nodes have successfully PREPAREd, the coordinator issues COMMIT
>> PREPARED to each node. What do you do if some nodes report ABORT
>> PREPARED while other nodes report COMMIT PREPARED? Do you abort the
>> transaction on coordinator, commit it or FATAL? This lets the cluster
>> in an inconsistent state, meaning that some consistent cluster-wide
>> recovery point is needed as well (Postgres-XC and XL have introduced
>> the concept of barriers for such problems, stuff created first by
>> Pavan Deolassee).
>
>
> My understanding is that once you get a successful PREPARE that should mean
> that it's basically impossible for the transaction to fail to commit. If
> that's not the case, I fail to see how you can get any decent level of
> sanity out of this...
When giving the responsability of a group of COMMIT PREPARED to a set
of nodes in a network, there could be a couple of problems showing up,
of the type split-brain for example. There could be as well failures
at hardware-level, so you would need a mechanism ensuring that WAL is
consistent among all the nodes, with for example the addition of a
common restore point on all the nodes once PREPARE is successfully
done with for example XLOG_RESTORE_POINT. That's a reason why I think
that the local Coordinator should use 2PC as well, to ensure a
consistency point once all the remote nodes have successfully
PREPAREd, and a reason why things can get complicated for either the
DBA or the upper application in charge of ensuring the DB consistency
even in case of critical failures.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-01-11 10:27:22 Re: Parallel Seq Scan
Previous Message Kohei KaiGai 2015-01-11 07:56:53 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)