Re: Transactions involving multiple postgres foreign servers

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Vinayak Pokale <vinpokale(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2016-09-28 06:30:00
Message-ID: CAFjFpRc8C=6E7UxswgFj9dUo0dursZec5m=DoyCUXN_15rBgJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 28, 2016 at 10:43 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>>>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>>>>> My original patch added code to manage the files for 2 phase
>>>>>> transactions opened by the local server on the remote servers. This
>>>>>> code was mostly inspired from the code in twophase.c which manages the
>>>>>> file for prepared transactions. The logic to manage 2PC files has
>>>>>> changed since [1] and has been optimized. One of the things I wanted
>>>>>> to do is see, if those optimizations are applicable here as well. Have
>>>>>> you considered that?
>>>>>>
>>>>>>
>>>>>
>>>>> Yeah, we're considering it.
>>>>> After these changes are committed, we will post the patch incorporated
>>>>> these changes.
>>>>>
>>>>> But what we need to do first is the discussion in order to get consensus.
>>>>> Since current design of this patch is to transparently execute DCL of
>>>>> 2PC on foreign server, this code changes lot of code and is
>>>>> complicated.
>>>>
>>>> Can you please elaborate. I am not able to understand what DCL is
>>>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>>>> command.
>>>
>>> I meant transaction management command such as PREPARE TRANSACTION and
>>> COMMIT/ABORT PREPARED command.
>>> The web page I refered might be wrong, sorry.
>>>
>>>>> Another approach I have is to push down DCL to only foreign servers
>>>>> that support 2PC protocol, which is similar to DML push down.
>>>>> This approach would be more simpler than current idea and is easy to
>>>>> use by distributed transaction manager.
>>>>
>>>> Again, can you please elaborate, how that would be different from the
>>>> current approach and how does it simplify the code.
>>>>
>>>
>>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
>>> PREPARED to foreign servers that support 2PC.
>>> With this idea, the client need to do following operation when foreign
>>> server is involved with transaction.
>>>
>>> BEGIN;
>>> UPDATE parent_table SET ...; -- update including foreign server
>>> PREPARE TRANSACTION 'xact_id';
>>> COMMIT PREPARED 'xact_id';
>>>
>>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
>>> down to foreign server.
>>> That is, the client needs to execute PREPARE TRANSACTION and
>>>
>>> In this idea, I think that we don't need to do followings,
>>>
>>> * Providing the prepare id of 2PC.
>>> Current patch adds new API prepare_id_provider() but we can use the
>>> prepare id of 2PC that is used on parent server.
>>>
>>> * Keeping track of status of foreign servers.
>>> Current patch keeps track of status of foreign servers involved with
>>> transaction but this idea is just to push down transaction management
>>> command to foreign server.
>>> So I think that we no longer need to do that.
>>
>>> COMMIT/ROLLBACK PREPARED explicitly.
>>
>> The problem with this approach is same as one previously stated. If
>> the connection between local and foreign server is lost between
>> PREPARE and COMMIT the prepared transaction on the foreign server
>> remains dangling, none other than the local server knows what to do
>> with it and the local server has lost track of the prepared
>> transaction on the foreign server. So, just pushing down those
>> commands doesn't work.
>
> Yeah, my idea is one of the first step.
> Mechanism that resolves the dangling foreign transaction and the
> resolver worker process are necessary.
>
>>>
>>> * Adding max_prepared_foreign_transactions parameter.
>>> It means that the number of transaction involving foreign server is
>>> the same as max_prepared_transactions.
>>>
>>
>> That isn't true exactly. max_prepared_foreign_transactions indicates
>> how many transactions can be prepared on the foreign server, which in
>> the method you propose should have a cap of max_prepared_transactions
>> * number of foreign servers.
>
> Oh, I understood, thanks.
>
> Consider sharding solution using postgres_fdw (that is, the parent
> postgres server has multiple shard postgres servers), we need to
> increase max_prepared_foreign_transactions whenever new shard server
> is added to cluster, or to allocate enough size in advance. But the
> estimation of enough max_prepared_foreign_transactions would not be
> easy, for example can we estimate it by (max throughput of the system)
> * (the number of foreign servers)?
>
> One new idea I came up with is that we set transaction id on parent
> server to global transaction id (gid) that is prepared on shard
> server.
> And pg_fdw_resolver worker process periodically resolves the dangling
> transaction on foreign server by comparing active lowest XID on parent
> server with the XID in gid used by PREPARE TRANSACTION.
>
> For example, suppose that there are one parent server and one shard
> server, and the client executes update transaction (XID = 100)
> involving foreign servers.
> In commit phase, parent server executes PREPARE TRANSACTION command
> with gid containing 100, say 'px_<random
> number>_100_<serverid>_<userid>', on foreign server.
> If the shard server crashed before COMMIT PREPARED, the transaction
> 100 become danging transaction.
>
> But resolver worker process on parent server can resolve it with
> following steps.
> 1. Get lowest active XID on parent server(XID=110).
> 2. Connect to foreign server. (Get foreign server information from
> pg_foreign_server system catalog.)
> 3. Check if there is prepared transaction with XID less than 110.
> 4. Rollback the dangling transaction found at #3 step.
> gid 'px_<random number>_100_<serverid>_<userid>' is prepared on
> foreign server by transaction 100, rollback it.

Why always rollback any dangling transaction? There can be a case that
a foreign server has a dangling transaction which needs to be
committed because the portions of that transaction on the other shards
are committed.

The way gid is crafted, there is no way to check whether the given
prepared transaction was created by the local server or not. Probably
the local server needs to add a unique signature in GID to identify
the transactions prepared by itself. That signature should be
transferred to standby to cope up with the fail-over of local server.
In this idea, one has to keep on polling the foreign server to find
any dangling transactions. In usual scenario, we shouldn't have a
large number of dangling transactions, and thus periodic polling might
be a waste.

>
> In this idea, we need gid provider API but parent server doesn't need
> to have persistent foreign transaction data.
> Also we could remove max_prepared_foreign_transactions, and fdw_xact.c
> would become more simple implementation.
>

I agree, but we need to cope with above two problems.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2016-09-28 06:32:15 Re: Supporting huge pages on Windows
Previous Message Heikki Linnakangas 2016-09-28 06:25:01 Re: should xlog_outdesc modify its argument?