Quick Links

Re: uninterruptable loop: concurrent delete in progress within table

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Sandro Santilli <strk(at)keybit(dot)net>
Subject:	Re: uninterruptable loop: concurrent delete in progress within table
Date:	2014-06-09 15:22:15
Message-ID:	20140609152215.GB8406@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 2014-06-07 15:20:07 -0700, Kevin Grittner wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> >> Have we really pinned down the semantics of the the return values
> >> for HeapTupleSatisfiesVacuum() at this point?
> >
> > Well, now that Andres unilaterally redefined HEAPTUPLE_INSERT_IN_PROGRESS
> > as meaning whatever the heck was easiest, I'd say no ;-).

Meh. It's not about defining it what was easiest. The issue is that
there's simply no clear definition of what's correct to return when both
xmin and xmax are in progress. You can argue that
HEAPTUPLE_INSERT_IN_PROGRESS is correct because it's not clear that the
row has ever been inserted. You can argue that DELETE_IN_PROGRESS is
more correct since it has actually been deleted. The latter has the
problem that callers might reasonably expect xmin to actually have
committed.

Note that the previous definition was clearly more bogus (which was
actively causing bugs!): When xmax was set but aborted it still returned
DELETE_IN_PROGRESS.

> > But we have to
> > think about what we do want from it. I'm not sure that we can positively
> > guarantee to distinguish all the possible states of an update-in-progress
> > tuple from outside the updating backend, and it likely would not be worth
> > the trouble to try since the answer could change at any instant anyway.
>
> That sounds like way more than SSI needs anyway.

Note that the current code *does* give an accurate answer in that
case. The only behavioural change is when xmin and xmax are *both* in
progress and are *not* running in the current backend.

Before my change it was:

else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
{
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
return HEAPTUPLE_INSERT_IN_PROGRESS;
/* only locked? run infomask-only check first, for performance */
if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) ||
HeapTupleHeaderIsOnlyLocked(tuple))
return HEAPTUPLE_INSERT_IN_PROGRESS;
/* inserted and then deleted by same xact */
return HEAPTUPLE_DELETE_IN_PROGRESS;
}
now it's
else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
{
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
return HEAPTUPLE_INSERT_IN_PROGRESS;
/* only locked? run infomask-only check first, for performance */
if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) ||
HeapTupleHeaderIsOnlyLocked(tuple))
return HEAPTUPLE_INSERT_IN_PROGRESS;
/* inserted and then deleted by same xact */
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple)))
return HEAPTUPLE_DELETE_IN_PROGRESS;
/* deleting subtransaction must have aborted */
return HEAPTUPLE_INSERT_IN_PROGRESS;
}
else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
{
/*
* It'd be possible to discern between INSERT/DELETE in progress
* here by looking at xmax - but that doesn't seem beneficial for
* the majority of callers and even detrimental for some. We'd
* rather have callers look at/wait for xmin than xmax. It's
* always correct to return INSERT_IN_PROGRESS because that's
* what's happening from the view of other backends.
*/
return HEAPTUPLE_INSERT_IN_PROGRESS;
}

Disregarding all that verbiage, the old behaviour was:

If xmin is in progress in this or another backend and xmax was ever set:
HEAPTUPLE_DELETE_IN_PROGRESS

now it's

If xmin is in progress and it's running in the current backend:
return INSERT/DELETE_IN_PROGRESS according to xmax
otherwise if xmin is in progress
INSERT_IN_PROGRESS

All the rest of the cases are unchanged.

> > For the statistical purposes I was on about in the other thread, I would
> > be satisfied if we could distinguish "will be live if all relevant
> > transactions commit" from "will be dead if all relevant transactions
> > commit".
>
> If I'm understanding you, and if HEAPTUPLE_LIVE and HEAPTUPLE_DEAD
> are clear-cut "nothing is in progress" cases, I think that might be
> enough.

Ok. Those haven't changed.

> We know coming in whether the tuple is visible to our own
> transaction;

So I understand it correctly that that predicate.c will never be invoked
HeapTupleSatisfiesVacuum() when the tuple has been created (INSERT, new
UPDATE version) by a currently running backend? Does that include the
case where the looked at tuple has been created by the current backend?

> the question is whether its existence has been or is
> potentially being changed by a top level transaction whose work we
> can't see.

I.e. this is effectively only looking for changes of xmax? Because
otherwise we'd not be looking at the tuple?

> Ideally we would ignore the effects of a subtransaction
> underneath such a top level transaction if that subtransaction has
> rolled back, but it would only compromise efficiency (not
> correctness) if we didn't ignore the work of subtransactions which
> have already been rolled back on an uncommitted top-level transaction.

That's the case.

> >> What the code in predicate.c is using this for is to determine, for
> >> a given tuple which the calling process is reading, whether it is
> >> visible to the calling process but has been deleted by another
> >> transaction (whose work this snapshot can't see), or is not visible
> >> to the calling process and has been inserted by another transaction
> >> (whose work this transaction can't see).
> >
> > I'm not sure why you'd need HeapTupleSatisfiesVacuum for that at all
> > --- it sounds like a plain visible-according-to-query-snapshot test.
>
> That is *one* of the things we need, and it is passed in as a bool
> parameter. We also need to know whether another transaction has
> committed, or might commit, something which would delete a visible
> tuple or insert a tuple which is not visible to us (but which has
> been read on a scan). In other words, would the results of this
> scan be different if run again after commit of the other
> transaction? We're talking about the
> CheckForSerializableConflictOut() function, in case my description
> here isn't clear; perhaps the function comments and/or code will
> clarify things.

I'm a bit confused by this - now you're talking about it touching tuples
which aren't visible to the current backend?
The relevant snippet are those cases here:
case HEAPTUPLE_DELETE_IN_PROGRESS:
xid = HeapTupleHeaderGetUpdateXid(tuple->t_data);
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
xid = HeapTupleHeaderGetXmin(tuple->t_data);
break;
Does it I) cause a bug if you're looking at xmax because
HEAPTUPLE_DELETE_IN_PROGRESS was returned even though xmin hasn't even
committed? That's the case 621a99a fixed. It looks like not because
a) you're chasing the xid to the toplevel xid anyway b) you're ignoring
the case where it was created by our own backend anyway.

Does it II) cause a bug if you're looking at xmin because
HEAPTUPLE_INSERT_IN_PROGRESS was returned even though xmax was set *iff*
xmin is also in progress by the same backend? It doesn't look so either
for I)'s a) reason. I.e. you'd in the end look at the same xid anyway.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: uninterruptable loop: concurrent delete in progress within table at 2014-06-07 22:20:07 from Kevin Grittner

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Jeff Janes	2014-06-09 15:57:35	Re: Many processes blocked at ProcArrayLock!
Previous Message	Andres Freund	2014-06-09 14:33:39	Re: BUG #10542: infinite loop in index.c when trying to reindex system tables (probably corrupted db state)