Re: [HACKERS] logical decoding of two-phase transactions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-07-18 14:56:31
Message-ID: CA+TgmoY2eTJJ5BVgtMkxjwCsrKPbaT_T5g0K5GcmeTk0FeF8DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 18, 2018 at 10:08 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> The problem is you don't know if a transaction does DDL sometime later, in
> the part that you might not have decoded yet (or perhaps concurrently with
> the decoding). So I don't see how you could easily exclude such transactions
> from the decoding ...

One idea is that maybe the running transaction could communicate with
the decoding process through shared memory. For example, suppose that
before you begin decoding an ongoing transaction, you have to send
some kind of notification to the process saying "hey, I'm going to
start decoding you" and wait for that process to acknowledge receipt
of that message (say, at the next CFI). Once it acknowledges receipt,
you can begin decoding. Then, we're guaranteed that the foreground
process knows when that it must be careful about catalog changes. If
it's going to make one, it sends a note to the decoding process and
says, hey, sorry, I'm about to do catalog changes, please pause
decoding. Once it gets an acknowledgement that decoding has paused,
it continues its work. Decoding resumes after commit (or maybe
earlier if it's provably safe).

> But isn't this (delaying the catalog cleanup etc.) pretty much the original
> approach, implemented by the original patch? Which you also claimed to be
> unworkable, IIRC? Or how is this addressing the problems with broken HOT
> chains, for example? Those issues were pretty much the reason why we started
> looking at alternative approaches, like delaying the abort ...

I don't think so. The original approach, IIRC, was to decode after
the abort had already happened, and my objection was that you can't
rely on the state of anything at that point. The approach here is to
wait until the abort is in progress and then basically pause it while
we try to read stuff, but that seems similarly riddled with problems.
The newer approach could be considered an improvement in that you've
tried to get your hands around the problem at an earlier point, but
it's not early enough. To take a very rough analogy, the original
approach was like trying to install a sprinkler system after the
building had already burned down, while the new approach is like
trying to install a sprinkler system when you notice that the building
is on fire. But we need to install the sprinkler system in advance.
That is, we need to make all of the necessary preparations for a
possible abort before the abort occurs. That could perhaps be done by
arranging things so that decoding after an abort is actually still
safe (e.g. by making it look to certain parts of the system as though
the aborted transaction is still in progress until decoding no longer
cares about it) or by making sure that we are never decoding at the
point where a problematic abort happens (e.g. as proposed above, pause
decoding before doing dangerous things).

> I wonder if disabling HOT on catalogs with wal_level=logical would be an
> option here. I'm not sure how important HOT on catalogs is, in practice (it
> surely does not help with the typical catalog bloat issue, which is
> temporary tables, because that's mostly insert+delete). I suppose we could
> disable it only when there's a replication slot indicating support for
> decoding of in-progress transactions, so that you still get HOT with plain
> logical decoding.

Are you talking about HOT updates, or HOT pruning? Disabling the
former wouldn't help, and disabling the latter would break VACUUM,
which assumes that any tuple not removed by HOT pruning is not a dead
tuple (cf. 1224383e85eee580a838ff1abf1fdb03ced973dc, which was caused
by a case where that wasn't true).

> I'm sure there will be other obstacles, not just the HOT chain stuff, but it
> would mean one step closer to a solution.

Right.

Here's a crazy idea. Instead of disabling HOT pruning or anything
like that, have the decoding process advertise the XID of the
transaction being decoded as its own XID in its PGPROC. Also, using
magic, acquire a lock on that XID even though the foreground
transaction already holds that lock in exclusive mode. Fix the code
(and I'm pretty sure there is some) that relies on an XID appearing in
the procarray only once to no longer make that assumption. Then, if
the foreground process aborts, it will appear to the rest of the
system that the it's still running, so HOT pruning won't remove the
XID, CLOG won't get truncated, people who are waiting to update a
tuple updated by the aborted transaction will keep waiting, etc. We
know that we do the right thing for running transactions, so if we
make this aborted transaction look like it is running and are
sufficiently convincing about the way we do that, then it should also
work. That seems more likely to be able to be made robust than
addressing specific problems (e.g. a tuple might get removed!) one by
one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-07-18 14:57:16 Re: More consistency for some file-related error message
Previous Message Heikki Linnakangas 2018-07-18 14:44:10 Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)