Re: MultiXact bugs

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <kgrittn(at)ymail(dot)com>
Subject: Re: MultiXact bugs
Date: 2013-11-25 15:36:19
Message-ID: 20131125153619.GA6597@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund wrote:

Hi,

> The attached pgbench testcase can reproduce two issues:

Thanks.

> 2) we frequently error out in heap_lock_updated_tuple_rec() with
> ERROR: unable to fetch updated version of tuple
>
> That's because when we're following a ctid chain, it's perfectly
> possible for the updated version of a tuple to already have been
> vacuumed/pruned away if the the updating transaction has aborted.
>
> Also looks like a 9.3+ issues to me, slightly higher impact, but in the
> end you're just getting some errors under concurrency.

Yes, I think this is 9.3 only. First attachment shows my proposed
patch, which is just to report success to caller in case the tuple
cannot be acquired by heap_fetch. This is OK because if by the time we
check the updated version of a tuple it is gone, it means there is no
further update chain to follow and lock.

> 1) (takes a bit)
> TRAP: FailedAssertion("!(((xid) >= ((TransactionId) 3)))", File:/pruneheap.c", Line: 601)
>
> That's because HeapTupleHeaderGetUpdateXid() ignores aborted updaters
> and returns InvalidTransactionId in that case, but
> HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS...

Interesting. This is a very narrow race condition: when we call
HeapTupleSafisfiesVacuum the updater is still running, so it returns
HEAPTUPLE_DELETE_IN_PROGRESS; but it aborts just before we read the
tuple's update Xid. So that one returns InvalidXid and that causes the
failure. I was able to hit this race condition very quickly by adding a
pg_usleep(1000) in heap_prune_chain(), in the
HEAPTUPLE_DELETE_IN_PROGRESS case, before trying to acquire the update
Xid.

There is no way to close the window, but there is no need; if the
updater aborted, we don't need to mark the page prunable in the first
place. So we can just check the return value of
HeapTupleHeaderGetUpdateXid and if it's InvalidXid, don't set the
prunable bit. The second attachment below fixes the bug that way.

I checked for other cases where the update Xid is checked after
HeapTupleSatisfiesVacuum returns HEAPTUPLE_DELETE_IN_PROGRESS. As far
as I can tell, the only one that would be affected is the one in
predicate.c. It is far from clear to me what is the right thing to do
in these cases; the simplest idea is to return without reporting a
failure if the updater aborted, just as above; but I wonder if this
needs to be conditional on "visible". I added a pg_usleep() before
acquiring the update Xid in the relevant case, but the isolation test
cases didn't hit the problem (I presume there is no update/delete in
these test cases, but I didn't check). I defer to Kevin on this issue.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
heap_fetch-failure-is-good.patch text/x-diff 926 bytes
dont-set-prunable-to-invalidxid.patch text/x-diff 1.2 KB
predicate-stuff-i-have-no-idea-about.patch text/x-diff 463 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-11-25 15:40:13 Re: docbook-xsl version for release builds
Previous Message Magnus Hagander 2013-11-25 15:32:22 Re: docbook-xsl version for release builds