Re: [HACKERS] logical decoding of two-phase transactions

From: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-07-19 08:55:19
Message-ID: CAMGcDxeZ+BCRb7xn+9VrSaceV5oxOCcbEjxH8P95TLVfUD+v8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert and Tomas,

It seems clear to me that the decodeGroup list of decoding backends
waiting on the backend doing the transaction of interest is not a
favored approach here. Note that I came down to this approach after
trying various other approaches/iterations. I was especially enthused
to see the lockGroupLeader implementation in the code and based this
decodeGroup implementation on the same premise. Although our
requirements are simply to have a list of waiters in the main
transaction backend process.

Sure, there might be some issues related to locking in the code, and
am willing to try and work them out. However if the decodeGroup
approach of interlocking abort processing with the decoding backends
is itself considered suspect, then it might be another waste of time.

> I think it's inevitable that any solution that is based on pausing
> decoding might have to wait for a theoretically unbounded time for
> decoding to get back to a point where it can safely pause. That is
> one of several reasons why I don't believe that any solution based on
> holding off aborts has any chance of being acceptable -- mid-abort is
> a terrible time to pause. Now, if the time is not only theoretically
> unbounded but also in practice likely to be very long (e.g. the
> foreground transaction could easily have to wait minutes for the
> decoding process to be able to process the pause request), then this
> whole approach is probably not going to work. If, on the other hand,
> the time is theoretically unbounded but in practice likely to be no
> more than a few seconds in almost every case, then we might have
> something. I don't know which is the case.

We have tried to minimize the pausing requirements by holding the
"LogicalLock" only when the decoding activity needs to access catalog
tables. The decoding goes ahead only if it gets the logical lock,
reads the catalog and unlocks immediately. If the decoding backend
does not get the "LogicalLock" then it stops decoding the current
transaction. So, the time to pause is pretty short in practical
scenarios.

>It probably depends on
> where you put the code to handle pause requests, and I'm not sure what
> options are viable. For example, if there's a loop that eats WAL
> records one at a time, and we can safely pause after any given
> iteration of that loop, that sounds pretty good, unless a single
> iteration of that loop might hang inside of a network I/O, in which
> case it sounds ... less good, probably?

It's for the above scenarios of not waiting inside network I/O that we
lock only before doing catalog access as described above.

> There are several issues there. The second and third ones boil down
> to this: As soon as the system thinks that your transaction is no
> longer in process, it is going to start making decisions based on
> whether that transaction committed or aborted. If it thinks your
> transaction aborted, it is going to feel entirely free to make
> decisions that permanently lose information -- like removing tuples or
> overwriting CTIDs or truncating CLOG or killing index entries. I
> doubt it makes any sense to try to fix each of those problems
> individually -- if we're going to do something about this, it had
> better be broad enough to nail all or nearly all of the problems in
> this area in one fell swoop.

Agreed, this was the crux of the issues. Decisions that cause
permanent loss of information regardless of the ongoing decoding
happening around that transaction was what led us down this rabbit
hole in the first place.

>> A dumb question - would this work with subtransaction-level aborts? I mean,
>> a transaction that does some catalog changes in a subxact which then however
>> aborts, but then still continues.
>
> That having been said, I cannot immediately see any reason why the
> idea that I sketched there couldn't be made to work just as well or
> poorly for subtransactions as it would for toplevel transactions. I
> don't really know that it will work even for toplevel transactions --
> that would require more thought and careful study than I've given it
> (or, given that this is not my patch, feel that I should need to give
> it). However, if it does, and if there are no other problems that
> I've missed in thinking casually about it, then I think it should be
> possible to make it work for subtransactions, too. Likely, as the
> decoding process first encountered each new sub-XID, it would need to
> magically acquire a duplicate lock and advertise the subxid just as it
> did for the toplevel XID, so that at any given time the set of XIDs
> advertised by the decoding process would be a subset (not necessarily
> proper) of the set advertised by the foreground process.
>

Am ready to go back to the drawing board and have another stab at this
pesky little large issue :-)

> To try to be a little clearer about my overall position, I am
> suggesting that you (1) abandon the current approach and (2) make sure
> that everything is done by making sufficient preparations in advance
> of any abort rather than trying to cope after it's already started. I
> am also suggesting that, to get there, it might be helpful to (a)
> contemplate communication and active cooperation between the running
> process and the decoding process(es), but it might turn out not to be
> needed and I don't know exactly what needs to be communicated, (b)
> consider whether it there's a reasonable way to make it look to other
> parts of the system like the aborted transaction is still running, but
> this also might turn out not to be the right approach, (c) consider
> whether logical decoding already does or can be made to use historical
> catalog snapshots that only see command IDs prior to the current one
> so that incompletely-made changes by the last CID aren't seen if an
> abort happens. I think there is a good chance that a full solution
> involves more than one of these things, and maybe some other things I
> haven't thought about. These are ideas, not a plan.
>

I will think more on the above lines and see if we can get something workable..

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2018-07-19 08:57:25 Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Previous Message Ashutosh Bapat 2018-07-19 08:52:05 Re: de-deduplicate code in DML execution hooks in postgres_fdw