Re: uninterruptable loop: concurrent delete in progress within table

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org, Sandro Santilli <strk(at)keybit(dot)net>
Subject: Re: uninterruptable loop: concurrent delete in progress within table
Date: 2014-06-02 17:08:20
Message-ID: 20140602170820.GB24145@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2014-06-02 12:48:01 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > I do wonder if any of the other existing callers of HTSV are affected. I
> > don't understand predicate.c well enough to be sure, but it looks to me
> > like it'd could in theory lead to missed conflicts. Seems fairly
> > unlikely to matter in practice though.
>
> One place where the difference in the new logic would cause a change is
> the tupleIsAlive bit in IndexBuildHeapScan(), when building unique
> indexes. Right now if the tuple is inserted by a remote running
> transaction, and updated by it, we return DELETE_IN_PROGRESS, so we set
> tupleIsAlive = false; so we wouldn't block if we see a tuple with that
> value elsewhere. If we instead return INSERT_IN_PROGRESS we set
> tupleIsAlive = true, and we would block until the inserter is done.

Hm. Maybe I am missing something, but if either INSERT or
DELETE_IN_PROGRESS is returned for a unique index in
IndexBuildHeapScan() we'll wait for the xmin/xmax respectively and
recheck the tuple, right? That recheck will then return a non-ephemeral
status.
The only change in the case you describe that I can see is whether we
loop once or twice?

> I think it'd be better to ensure that DELETE_IN_PROGRESS is returned as
> appropriate in the new XidIsInProgress(xmin) block.

That's still a valid position.

> > I have to say I really hate the amount of repetitive code inside the
> > individual visibility routines. Obviously it's nothing backpatchable and
> > may become moot to a certain degree with the CSN work, but those are
> > pretty close to being unmaintainable.
>
> One thing I'm now wondering while reading this code is those cases where
> XMAX_INVALID is not set, but the value in Xmax is InvalidXid. We have
> discussed these scenarios elsewhere and the conclusion seems to be that
> they are rare but possible and therefore we should cater for them.

Is there really a situation where that combination is possible for
XMAX_INVALID? I don't see how.

Are you possibly thinking of the case where 9.3 assumed wrongly for a
while that a set XMAX_INVALID would be sufficient for freezing?

> (IIRC the scenario is a tuple that is frozen without a full-page write
> and no checksums; if we crash before writing the buffer, we would
> recover the invalid value in Xmax but the hint bit would not become set.
> I think this is not possible anymore after we tweaked the freezing
> protocol.)

I don't think that could have happened before either. After the crash
historically we'd have gone through the same freezing routine. That'd
have either left the whole thing intact (if xmin went backwards) or it'd
have set HEAP_XMAX_INVALID together with setting InvalidTransactionId as
xmax.

> So instead of a check like
> if (tuple->t_infomask & HEAP_XMAX_INVALID)
> we would have something like
> if (HeapTupleHeaderXmaxIsInvalid(tuple))
> which checks both things -- and is easier to read, IMHO.

Note that that bit of code isn't new. It just has moved a tiny
bit. Similar tests are splattered all over tqual.c - so I have a hard
time thinking this could be broken.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2014-06-02 17:17:36 Re: BUG #10500: Cannot restore from a dump when some function is used in public shcema
Previous Message Alvaro Herrera 2014-06-02 16:48:01 Re: uninterruptable loop: concurrent delete in progress within table