Re: Transactions involving multiple postgres foreign servers

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Vinayak Pokale <vinpokale(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2016-10-04 01:47:23
Message-ID: CAD21AoCsmPjTfQt0tgd-wzQ3zpJ-=FDWMMuwwhk4AKffyU5-4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 28, 2016 at 3:30 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Wed, Sep 28, 2016 at 10:43 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>>>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>>>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>>>>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>>>>>> My original patch added code to manage the files for 2 phase
>>>>>>> transactions opened by the local server on the remote servers. This
>>>>>>> code was mostly inspired from the code in twophase.c which manages the
>>>>>>> file for prepared transactions. The logic to manage 2PC files has
>>>>>>> changed since [1] and has been optimized. One of the things I wanted
>>>>>>> to do is see, if those optimizations are applicable here as well. Have
>>>>>>> you considered that?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Yeah, we're considering it.
>>>>>> After these changes are committed, we will post the patch incorporated
>>>>>> these changes.
>>>>>>
>>>>>> But what we need to do first is the discussion in order to get consensus.
>>>>>> Since current design of this patch is to transparently execute DCL of
>>>>>> 2PC on foreign server, this code changes lot of code and is
>>>>>> complicated.
>>>>>
>>>>> Can you please elaborate. I am not able to understand what DCL is
>>>>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>>>>> command.
>>>>
>>>> I meant transaction management command such as PREPARE TRANSACTION and
>>>> COMMIT/ABORT PREPARED command.
>>>> The web page I refered might be wrong, sorry.
>>>>
>>>>>> Another approach I have is to push down DCL to only foreign servers
>>>>>> that support 2PC protocol, which is similar to DML push down.
>>>>>> This approach would be more simpler than current idea and is easy to
>>>>>> use by distributed transaction manager.
>>>>>
>>>>> Again, can you please elaborate, how that would be different from the
>>>>> current approach and how does it simplify the code.
>>>>>
>>>>
>>>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
>>>> PREPARED to foreign servers that support 2PC.
>>>> With this idea, the client need to do following operation when foreign
>>>> server is involved with transaction.
>>>>
>>>> BEGIN;
>>>> UPDATE parent_table SET ...; -- update including foreign server
>>>> PREPARE TRANSACTION 'xact_id';
>>>> COMMIT PREPARED 'xact_id';
>>>>
>>>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
>>>> down to foreign server.
>>>> That is, the client needs to execute PREPARE TRANSACTION and
>>>>
>>>> In this idea, I think that we don't need to do followings,
>>>>
>>>> * Providing the prepare id of 2PC.
>>>> Current patch adds new API prepare_id_provider() but we can use the
>>>> prepare id of 2PC that is used on parent server.
>>>>
>>>> * Keeping track of status of foreign servers.
>>>> Current patch keeps track of status of foreign servers involved with
>>>> transaction but this idea is just to push down transaction management
>>>> command to foreign server.
>>>> So I think that we no longer need to do that.
>>>
>>>> COMMIT/ROLLBACK PREPARED explicitly.
>>>
>>> The problem with this approach is same as one previously stated. If
>>> the connection between local and foreign server is lost between
>>> PREPARE and COMMIT the prepared transaction on the foreign server
>>> remains dangling, none other than the local server knows what to do
>>> with it and the local server has lost track of the prepared
>>> transaction on the foreign server. So, just pushing down those
>>> commands doesn't work.
>>
>> Yeah, my idea is one of the first step.
>> Mechanism that resolves the dangling foreign transaction and the
>> resolver worker process are necessary.
>>
>>>>
>>>> * Adding max_prepared_foreign_transactions parameter.
>>>> It means that the number of transaction involving foreign server is
>>>> the same as max_prepared_transactions.
>>>>
>>>
>>> That isn't true exactly. max_prepared_foreign_transactions indicates
>>> how many transactions can be prepared on the foreign server, which in
>>> the method you propose should have a cap of max_prepared_transactions
>>> * number of foreign servers.
>>
>> Oh, I understood, thanks.
>>
>> Consider sharding solution using postgres_fdw (that is, the parent
>> postgres server has multiple shard postgres servers), we need to
>> increase max_prepared_foreign_transactions whenever new shard server
>> is added to cluster, or to allocate enough size in advance. But the
>> estimation of enough max_prepared_foreign_transactions would not be
>> easy, for example can we estimate it by (max throughput of the system)
>> * (the number of foreign servers)?
>>
>> One new idea I came up with is that we set transaction id on parent
>> server to global transaction id (gid) that is prepared on shard
>> server.
>> And pg_fdw_resolver worker process periodically resolves the dangling
>> transaction on foreign server by comparing active lowest XID on parent
>> server with the XID in gid used by PREPARE TRANSACTION.
>>
>> For example, suppose that there are one parent server and one shard
>> server, and the client executes update transaction (XID = 100)
>> involving foreign servers.
>> In commit phase, parent server executes PREPARE TRANSACTION command
>> with gid containing 100, say 'px_<random
>> number>_100_<serverid>_<userid>', on foreign server.
>> If the shard server crashed before COMMIT PREPARED, the transaction
>> 100 become danging transaction.
>>
>> But resolver worker process on parent server can resolve it with
>> following steps.
>> 1. Get lowest active XID on parent server(XID=110).
>> 2. Connect to foreign server. (Get foreign server information from
>> pg_foreign_server system catalog.)
>> 3. Check if there is prepared transaction with XID less than 110.
>> 4. Rollback the dangling transaction found at #3 step.
>> gid 'px_<random number>_100_<serverid>_<userid>' is prepared on
>> foreign server by transaction 100, rollback it.
>
> Why always rollback any dangling transaction? There can be a case that
> a foreign server has a dangling transaction which needs to be
> committed because the portions of that transaction on the other shards
> are committed.

Right, we can heuristically make a decision whether we do COMMIT or
ABORT on local server.
For example, if COMMIT PREPARED succeeded on at least one foreign
server, the local server return OK to client and the other dangling
transactions should be committed later.
We can find out that we should do either commit or abort the dangling
transaction by checking CLOG.

But we need to handle the case where the CLOG file containing XID
necessary for resolving dangling transaction is truncated.
If the user does VACUUM FREEZE just after remote server crashed, it
could be truncated.

> The way gid is crafted, there is no way to check whether the given
> prepared transaction was created by the local server or not. Probably
> the local server needs to add a unique signature in GID to identify
> the transactions prepared by itself. That signature should be
> transferred to standby to cope up with the fail-over of local server.

Maybe we can use database system identifier in control file.

> In this idea, one has to keep on polling the foreign server to find
> any dangling transactions. In usual scenario, we shouldn't have a
> large number of dangling transactions, and thus periodic polling might
> be a waste.

We can optimize it by storing the XID that is resolved heuristically
into the control file or system catalog, for example.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-10-04 03:25:08 Re: multivariate statistics (v19)
Previous Message Michael Paquier 2016-10-04 00:43:16 Re: Tracking wait event for latches