Re: [HACKERS] Transactions involving multiple postgres foreign servers

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Transactions involving multiple postgres foreign servers
Date: 2017-12-13 01:47:00
Message-ID: CAD21AoBqqjFETNQ0m3TQExC1+ZTT4L0VSo6uddHSvouP+p1LDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 13, 2017 at 12:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> The question I have is how would we deal with a foreign server that is
>>> not available for longer duration due to crash, longer network outage
>>> etc. Example is the foreign server crashed/got disconnected after
>>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>>> blocked for much longer duration without user having an idea of what's
>>> going on. May be we should add some timeout.
>>
>> After more thought, I agree with adding some timeout. I can image
>> there are users who want the timeout, for example, who cannot accept
>> even a few seconds latency. If the timeout occurs backend unlocks the
>> foreign transactions and breaks the loop. The resolver process will
>> keep to continue to resolve foreign transactions at certain interval.
>
> I don't think a timeout is a very good idea. There is no timeout for
> synchronous replication and the issues here are similar. I will not
> try to block a patch adding a timeout, but I think it had better be
> disabled by default and have very clear documentation explaining why
> it's really dangerous. And this is why: with no timeout, you can
> count on being able to see the effects of your own previous
> transactions, unless at some point you sent a query cancel or got
> disconnected. With a timeout, you may or may not see the effects of
> your own previous transactions depending on whether or not you hit the
> timeout, which you have no sure way of knowing.
>
>>>> transactions after the coordinator server recovered. On the other
>>>> hand, for the reading a consistent result on such situation by
>>>> subsequent reads, for example, we can disallow backends to inquiry SQL
>>>> to the foreign server if a foreign transaction of the foreign server
>>>> is remained.
>>>
>>> +1 for the last sentence. If we do that, we don't need the backend to
>>> be blocked by resolver since a subsequent read accessing that foreign
>>> server would get an error and not inconsistent data.
>>
>> Yeah, however the disadvantage of this is that we manage foreign
>> transactions per foreign servers. If a transaction that modified even
>> one table is remained as a in-doubt transaction, we cannot issue any
>> SQL that touches that foreign server. Can we occur an error at
>> ExecInitForeignScan()?
>
> I really feel strongly we shouldn't complicate the initial patch with
> this kind of thing. Let's make it enough for this patch to guarantee
> that either all parts of the transaction commit eventually or they all
> abort eventually. Ensuring consistent visibility is a different and
> hard project, and if we try to do that now, this patch is not going to
> be done any time soon.
>

Thank you for the suggestion.

I was really wondering if we should add a timeout to this feature.
It's a common concern that we want to put a timeout at critical
section. But currently we don't have such timeout to neither
synchronous replication or writing WAL. I can image there will be
users who want to a timeout for such cases but obviously it makes this
feature more complex. Anyway, even if we add a timeout to this feature
we can make it as a separated patch and feature. So I'd like to keep
it simple as first step. This patch guarantees that the transaction
commit or rollback on all foreign servers or not unless users doesn't
cancel.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-12-13 01:48:05 Re: PATCH: Exclude unlogged tables from base backups
Previous Message Bill Moyers 2017-12-13 01:45:44 [PATCH] Possible NULL deref in str_time