From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
Cc: | tsunakawa(dot)takay(at)fujitsu(dot)com, movead(dot)li(at)highgo(dot)ca, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Global snapshots |
Date: | 2020-09-04 18:31:14 |
Message-ID: | 3ef7877bfed0582019eab3d462a43275@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-07-27 09:44, Andrey V. Lepikhov wrote:
> On 7/27/20 11:22 AM, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
>>
>> US8356007B2 - Distributed transaction management for database systems
>> with multiversioning - Google Patents
>> https://patents.google.com/patent/US8356007
>>
>>
>> If it is, can we circumvent this patent?
>>
>
> Thank you for the research (and previous links too).
> I haven't seen this patent before. This should be carefully studied.
I had a look on the patch set, although it is quite outdated, especially
on 0003.
Two thoughts about 0003:
First, IIUC atomicity of the distributed transaction in the postgres_fdw
is achieved by the usage of 2PC. I think that this postgres_fdw 2PC
support should be separated from global snapshots. It could be useful to
have such atomic distributed transactions even without a proper
visibility, which is guaranteed by the global snapshot. Especially
taking into account the doubts about Clock-SI and general questions
about algorithm choosing criteria above in the thread.
Thus, I propose to split 0003 into two parts and add a separate GUC
'postgres_fdw.use_twophase', which could be turned on independently from
'postgres_fdw.use_global_snapshots'. Of course if the latter is enabled,
then 2PC should be forcedly turned on as well.
Second, there are some problems with errors handling in the 0003 (thanks
to Arseny Sher for review).
+error:
+ if (!res)
+ {
+ sql = psprintf("ABORT PREPARED '%s'", fdwTransState->gid);
+ BroadcastCmd(sql);
+ elog(ERROR, "Failed to PREPARE transaction on remote node");
+ }
It seems that we should never reach this point, just because
BroadcastStmt will throw an ERROR if it fails to prepare transaction on
the foreign server:
+ if (PQresultStatus(result) != expectedStatus ||
+ (handler && !handler(result, arg)))
+ {
+ elog(WARNING, "Failed command %s: status=%d, expected status=%d",
sql, PQresultStatus(result), expectedStatus);
+ pgfdw_report_error(ERROR, result, entry->conn, true, sql);
+ allOk = false;
+ }
Moreover, It doesn't make much sense to try to abort prepared xacts,
since if we failed to prepare it somewhere, then some foreign servers
may become unavailable already and this doesn't provide us a 100%
guarantee of clean up.
+ /* COMMIT open transaction of we were doing 2PC */
+ if (fdwTransState->two_phase_commit &&
+ (event == XACT_EVENT_PARALLEL_COMMIT || event == XACT_EVENT_COMMIT))
+ {
+ BroadcastCmd(psprintf("COMMIT PREPARED '%s'", fdwTransState->gid));
+ }
At this point, the host (local) transaction is already committed and
there is no way to abort it gracefully. However, BroadcastCmd may rise
an ERROR that will cause a PANIC, since it is non-recoverable state:
PANIC: cannot abort transaction 487, it was already committed
Attached is a patch, which implements a plain 2PC in the postgres_fdw
and adds a GUC 'postgres_fdw.use_twophase'. Also it solves these errors
handling issues above and tries to add proper comments everywhere. I
think, that 0003 should be rebased on the top of it, or it could be a
first patch in the set, since it may be used independently. What do you
think?
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
0001-Add-postgres_fdw.use_twophase-GUC-to-use-2PC.patch | text/x-diff | 12.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2020-09-04 18:31:36 | Re: Disk-based hash aggregate's cost model |
Previous Message | Hsu, John | 2020-09-04 18:03:39 | Improve pg_dump dumping publication tables |