From: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Kevin Grittner <kgrittn(at)ymail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Transactions involving multiple postgres foreign servers |
Date: | 2015-01-09 06:49:21 |
Message-ID: | CAFjFpRf3=ByJ-k5jSveWa5auwo3SK2Z8FamWKUfvTCoSNjEM-w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 8, 2015 at 8:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jan 7, 2015 at 11:20 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> > If we are going to include a distributed transaction manager with
> > PostgreSQL, it *must* persist enough information about the
> > transaction ID and where it is used in a way that will survive a
> > subsequent crash before beginning the PREPARE on any of the
> > systems. After all nodes are PREPAREd it must flag that persisted
> > data to indicate that it is now at a point where ROLLBACK is no
> > longer an option. Only then can it start committing the prepared
> > transactions. After the last node is committed it can clear this
> > information. On start-up the distributed transaction manager must
> > check for any distributed transactions left "in progress" and
> > commit or rollback based on the preceding; doing retries
> > indefinitely until it succeeds or is told to stop.
>
> I think one key question here is whether all of this should be handled
> in PostgreSQL core or whether some of it should be handled in other
> ways. Is the goal to make postgres_fdw (and FDWs for other databases
> that support 2PC) to persist enough information that someone *could*
> write a transaction manager for PostgreSQL, or is the goal to actually
> write that transaction manager?
>
> Just figuring out how to persist the necessary information is a
> non-trivial problem by itself. You might think that you could just
> insert a row into a local table saying, hey, I'm about to prepare a
> transaction remotely, but of course that doesn't work: if you then go
> on to PREPARE before writing and flushing the local commit record,
> then a crash before that's done leaves a dangling prepared transaction
> on the remote note. You might think to write the record, then after
> writing and flush the local commit record do the PREPARE. But you
> can't do that either, because now if the PREPARE fails you've already
> committed locally.
>
> I guess what you need to do is something like:
>
> 1. Write and flush a WAL record indicating an intent to prepare, with
> a list of foreign server OIDs and GUIDs.
> 2. Prepare the remote transaction on each node. If any of those
> operations fail, roll back any prepared nodes and error out.
> 3. Commit locally (i.e. RecordTransactionCommit, writing and flushing WAL).
> 4. Try to commit the remote transactions.
> 5. Write a WAL record indicating that you committed the remote
> transactions OK.
>
> If you fail after step 1, you can straighten things out by looking at
> the status of the transaction: if the transaction committed, any
> transactions we intended-to-prepare need to be checked. If they are
> still prepared, we need to commit them or roll them back according to
> what happened to our XID.
>
When you want to strengthen and commit things, the foreign server may not
be available to do that. As Kevin pointed out in above, we need to keep on
retrying to resolve (commit or rollback based on the status of local
transaction) the PREPAREd transactions on foreign server till they are
resolved. So, we will have to persist the information somewhere else than
the WAL OR keep on persisting the WALs even after the corresponding local
transaction has been committed or aborted, which I don't think is a good
idea, since that will have impact on replication, VACUUM esp. because it's
going to affect the oldest transaction in WAL.
That's where Andres's suggestion might help.
>
> (Andres is talking in my other ear suggesting that we ought to reuse
> the 2PC infrastructure to do all this. I'm not convinced that's a
> good idea, but I'll let him present his own ideas here if he wants to
> rather than trying to explain them myself.)
>
>
We can persist the information about distributed transaction (which esp.
require 2PC) similar to the way as 2PC infrastructure in pg_twophase
directory. I am still investigating whether we can re-use existing 2PC
infrastructure or not. My initial reaction is no, since 2PC persists
information about local transaction including locked objects, WALs (?) in
pg_twophase directory, which is not required for a distributed transaction.
But rest of the mechanism like the manner of processing the records during
normal processing and recovery looks very useful.
--
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Kirkwood | 2015-01-09 06:53:48 | Re: List of table names of a DB |
Previous Message | Tatsuo Ishii | 2015-01-09 06:38:50 | Re: List of table names of a DB |