From: | Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Cc: | Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: [HACKERS] logical decoding of two-phase transactions |
Date: | 2017-11-24 05:44:52 |
Message-ID: | CAMGcDxeDd4uB+tvaYrtjLgykeOpa-5nwPetNM+prCQ=wpPmPaQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Craig,
> I didn't find any sort of sensible situation where locking would pose
> issues. Unless you're taking explicit LOCKs on catalog tables, you should be
> fine.
>
> There may be issues with CLUSTER or VACUUM FULL of non-relmapped catalog
> relations I guess. Personally I put that in the "don't do that" box, but if
> we really have to guard against it we could slightly expand the limits on
> which txns you can PREPARE to any txn that has a strong lock on a catalog
> relation.
>
Well, we don't allow VACUUM FULL of regular tables in transaction blocks.
I tried "CLUSTER user_table USING pkey", it works and also it does not take
strong locks on catalog tables which would halt the decoding process.
ALTER TABLE
works without stalling decoding already as mentioned earlier.
>>
>> The issue is with a concurrent rollback of the prepared transaction.
>> We need a way to ensure that
>> the 2PC does not abort when we are in the midst of a change record
>> apply activity.
>
>
> The *reason* we care about this is that tuples created by aborted txns are
> not considered "recently dead" by vacuum. They can be marked invalid and
> removed immediately due to hint-bit setting and HOT pruning, vacuum runs,
> etc.
>
> This could create an inconsistent view of the catalogs if our prepared txn
> did any DDL. For example, we might've updated a pg_class row, so we created
> a new row and set xmax on the old one. Vacuum may merrily remove our new row
> so there's no way we can find the correct data anymore, we'd have to find
> the outdated row or no row. By my reading of HeapTupleSatisfiesMVCC we'll
> see the old pg_class tuple.
>
> Similar issues apply for pg_attribute etc etc. We might try to decode a
> record according to the wrong table structure because relcache lookups
> performed by the plugin will report outdated information.
>
We actually do the decoding in a PG_TRY/CATCH block, so if there are
any errors we
can clean those up in the CATCH block. If it's a prepared transaction
then we can send
an ABORT to the remote side to clean itself up.
> The sanest option here seems to be to stop the txn from aborting while we're
> decoding it, hence Nikhil's suggestions.
>
If we do the cleanup above in the CATCH block, then do we really care?
I guess the issue would be in determining why we reached the CATCH
block, whether it was due to a decoding error or due to network issues
or something else..
>>
>> * We introduce two new booleans in the TwoPhaseState
>> GlobalTransactionData structure.
>> bool beingdecoded;
>> bool abortpending;
>
>
> I think it's premature to rule out the simple option of doing a LockGXact
> when we start decoding. Improve the error "prepared transaction with
> identifier \"%s\" is busy" to report the locking pid too. It means you
> cannot rollback or commit a prepared xact while it's being decoded, but for
> the intended use of this patch, I think that's absolutely fine anyway.
>
> But I like your suggestion much more than the idea of taking a LWLock on
> TwoPhaseStateLock while decoding a record. Lots of LWLock churn, and LWLocks
> held over arbitrary user plugin code. Not viable.
>
> With your way we just have to take a LWLock once on TwoPhaseState when we
> test abortpending and set beingdecoded. After that, during decoding, we can
> do unlocked tests of abortpending, since a stale read will do nothing worse
> than delay our response a little. The existing 2PC ops already take the
> LWLock and can examine beingdecoded then. I expect they'd need to WaitLatch
> in a loop until beingdecoded was cleared, re-acquiring the LWLock and
> re-checking each time it's woken. We should probably add a field there for a
> waiter proc that wants its latch set, so 2pc ops don't usually have to poll
> for decoding to finish. (Unless condition variables will help us here?)
>
Yes, WaitLatch could do the job here.
> However, let me make an entirely alternative suggestion. Should we add a
> heavyweight lock class for 2PC xacts instead, and leverage the existing
> infrastructure? We already use transaction locks widely after all. That way,
> we just take some kind of share lock on the 2PC xact by xid when we start
> logical decoding of the 2pc xact. ROLLBACK PREPARED and COMMIT PREPARED
> would acquire the same heavyweight lock in an exclusive mode before grabbing
> TwoPhaseStateLock and doing their work.
>
> That way we get automatic cleanup when decoding procs exit, we get wakeups
> for waiters, etc, all for "free".
>
> How practical is adding a lock class?
Am open to suggestions. This looks like it could work decently.
>
> Just to be explicit, this means "tell the downstream the xact has aborted".
> Currently logical decoding does not ever start decoding an xact until it's
> committed, so it has never needed an abort callback on the output plugin
> interface.
>
> But we'll need one when we start doing speculative logical decoding of big
> txns before they commit, and we'll need it for this. It's relatively
> trivial.
>
Yes, it will be a standard wrapper call to implement on both send and
apply side.
>>
>> This abort_prepared callback will abort the dummy PREPARED query from
>>
>> step (3) above. Instead of doing this, we could actually check if the
>> 'GID' entry exists and then call ROLLBACK PREPARED on the subscriber.
>> But in that case we can't be sure if the GID does not exist because of
>> a rollback-during-decode-issue on the provider or due to something
>> else. If we are ok with not finding GIDs on the subscriber side, then
>> am fine with removing the DUMMY prepare from step (3).
>
>
> I prefer the latter approach personally, not doing the dummy 2pc xact.
> Instead we can just ignore a ROLLBACK PREPARED for a txn whose gid does not
> exist on the downstream. I can easily see situations where we might manually
> abort a txn and wouldn't want logical decoding to get perpetually stuck
> waiting to abort a gid that doesn't exist, for example.
>
> Ignoring commit prepared for a missing xact would not be great, but I think
> it's sensible enough to ignore missing GIDs for rollback prepared.
>
Yes, that makes sense in case of ROLLBACK. If we find a missing GID
for a COMMIT PREPARE we are in for some trouble.
> We'd need a race-free way to do that though, so I think we'd have to extend
> FinishPreparedTransaction and LockGXact with some kind of missing_ok flag. I
> doubt that'd be controversial.
>
Sure.
>
> A couple of other considerations not covered in what you wrote:
>
> - It's really important that the hook that decides whether to decode an xact
> at prepare or commit prepared time reports the same answer each and every
> time, including if it's called after a prepared txn has committed. It
> probably can't look at anything more than the xact's origin replica
> identity, xid, and gid. This also means we need to know the gid of prepared
> txns when processing their commit record, so we can tell logical decoding
> whether we already sent the data to the client at prepare-transaction time,
> or if we should send it at commit-prepared time instead.
>
We already have a filter_prepare_cb hook in place for this. TBH, I
don't think this patch needs to worry about the internals of that
hook. Whatever it returns, if it returns the same value everytime then
we should be good from the logical decoding perspective
I think, if we encode the logic in the GID itself, then it will be
good and consistent everytime. For example, if the hook sees a GID
with the prefix '_$Logical_', then it knows it has to PREPARE it.
Others can be decoded at commit time.
> - You need to flush the syscache when you finish decoding a PREPARE
> TRANSACTION of an xact that made catalog changes, unless it's immediately
> followed by COMMIT PREPARED of the same xid. Because xacts between the two
> cannot see changes the prepared xact made to the catalogs.
>
> - For the same reason we need to ensure that the historic snapshot used to
> decode a 2PC xact that made catalog changes isn't then used for subsequent
> xacts between the prepare and commit. They'd see the uncommitted catalogs of
> the prepared xact.
>
Yes, we will do TeardownHistoricSnapshot and syscache flush as part of
the cleanup for 2PC transactions.
Regards,
Nikhils
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2017-11-24 06:41:22 | Re: [HACKERS] logical decoding of two-phase transactions |
Previous Message | Craig Ringer | 2017-11-24 05:37:05 | Re: Failed to delete old ReorderBuffer spilled files |