From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: [HACKERS] logical decoding of two-phase transactions |
Date: | 2018-04-04 16:22:54 |
Message-ID: | 9cac2837-7e58-1cad-3940-27ac0dd7c198@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I think the patch looks mostly fine. I'm about to do a bit more testing
on it, but a few comments. Attached diff shows which the discussed
places / comments more closely.
1) There's a race condition in LogicalLockTransaction. The code does
roughly this:
if (!BecomeDecodeGroupMember(...))
... bail out ...
Assert(MyProc->decodeGroupLeader);
lwlock = LockHashPartitionLockByProc(MyProc->decodeGroupLeader);
...
but AFAICS there is no guarantee that the transaction does not commit
(or even abort) right after the become decode group member. In which
case LogicalDecodeRemoveTransaction might have already reset our pointer
to a leader to NULL. In which case the Assert() and lock will fail.
I've initially thought this can be fixed by setting decodeLocked=true in
BecomeDecodeGroupMember, but that's not really true - that would fix the
race for aborts, but not commits. LogicalDecodeRemoveTransaction skips
the wait for commits entirely, and just resets the flags anyway.
So this needs a different fix, I think. BecomeDecodeGroupMember also
needs the leader PGPROC pointer, but it does not have the issue because
it gets it as a parameter. I think the same thing would work for here
too - that is, use the AssignDecodeGroupLeader() result instead.
2) BecomeDecodeGroupMember sets the decodeGroupLeader=NULL when the
leader does not match the parameters, despite enforcing it by Assert()
at the beginning. Let's remove that assignment.
3) I don't quite understand why BecomeDecodeGroupMember does the
cross-check using PID. In which case would it help?
4) AssignDecodeGroupLeader still sets pid, which is never read. Remove.
5) ReorderBufferCommitInternal does elog(LOG) about interrupting the
decoding of aborted transaction only in one place. There are about three
other places where we check LogicalLockTransaction. Seems inconsistent.
6) The comment before LogicalLockTransaction is somewhat inaccurate,
because it talks about adding/removing the backend to the group, but
that's not what's happening. We join the group on the first call and
then we only tweak the decodeLocked flag.
7) I propose minor changes to a couple of comments.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
2pc-review.diff | text/x-patch | 6.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2018-04-04 16:30:50 | Re: [HACKERS] GUC for cleanup indexes threshold. |
Previous Message | Tom Lane | 2018-04-04 16:19:34 | Re: Add support for printing/reading MergeAction nodes |