Re: assertion failure 9.3.4

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: assertion failure 9.3.4
Date: 2014-04-22 21:01:40
Message-ID: 20140422210140.GI25695@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund wrote:
> On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote:
> >
> > On 04/21/2014 02:54 PM, Andres Freund wrote:
> > >Hi,
> > >
> > >I spent the last two hours poking arounds in the environment Andrew
> > >provided and I was able to reproduce the issue, find a assert to
> > >reproduce it much faster and find a possible root cause.
> >
> >
> > What's the assert that makes it happen faster? That might help a lot in
> > constructing a self-contained test.
>
> Assertion and *preliminary*, *hacky* fix attached.

Thanks for the analysis and patches. I've been playing with this on my
own a bit, and one thing that I just noticed is that at least for
heap_update I cannot reproduce a problem when the xmax is originally a
multixact, so AFAICT the number of places that need patched aren't as
many.

Some testing later, I think the issue only occurs if we determine that
we don't need to wait for the xid/multi to complete, because otherwise
the wait itself saves us. (It's easy to cause the problem by adding a
breakpoint in heapam.c:3325, i.e. just before re-acquiring the buffer
lock, and then having transaction A lock for key share, then transaction
B update the tuple which stops at the breakpoint, then transaction A
also update the tuple, and finally release transaction B).

For now I offer a cleaned up version of your patch to add the assertion
that multis don't contain multiple updates. I considered the idea of
making this #ifdef USE_ASSERT_CHECKING, because it has to walk the
complete array of members; and then have full elogs in MultiXactIdExpand
and MultiXactIdCreate, which are lighter because they can check more
easily. But on second thoughts I refrained from doing that, because
surely the arrays are not as large anyway, are they.

I think I should push this patch first, so that Andrew and Josh can try
their respective test cases which should start throwing errors, then
push the actual fixes. Does that sound okay?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
complain-multi-updates.patch text/x-diff 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-04-22 21:20:39 Re: assertion failure 9.3.4
Previous Message Peter Geoghegan 2014-04-22 20:16:00 Re: Clock sweep not caching enough B-Tree leaf pages?