Re: Vacuum/visibility is busted

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 08:55:31
Message-ID: CABOikdPr8-29NEta1grOi7=FyVVtc5gA5NLtOX6O6M=gLDsDqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 11:09 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
> that vacuum/visibility was busted. But it wasn't his patch that busted it.
> As far as I can tell, the bad commit was in the range
> 692079e5dcb331..168d3157032879
>
> Since a run takes 12 to 24 hours, it will take a while to refine that
> interval.
>
> I was testing using the framework explained here:
>
> http://www.postgresql.org/message-id/CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com
>
> Except that I increased JJ_torn_page to 8000, so that autovacuum has a
> chance to run to completion before each crash; and I turned off archive_mode
> as it was not relevant and caused annoying noise. As far as I know,
> crashing is entirely irrelevant to the current problem, but I just used and
> adapted the framework I had at hand.
>
> A tarball of the data directory is available below, for those who would
> like to do a forensic inspection. The table jjanes.public.foo is clearly in
> violation of its unique index.

The xmins of all the duplicate tuples look dangerously close to 2^31.
I wonder if XID wrap around has anything to do with it.

Index scans do not return any duplicates and you need to force a seq
scan to see them. Assuming that the page level VM bit might be
corrupted, I tried to REINDEX the table to see if it complains of
unique key violations, but that crashes the server with

TRAP: FailedAssertion("!(((bool) ((root_offsets[offnum - 1] !=
((OffsetNumber) 0)) && (root_offsets[offnum - 1] <= ((OffsetNumber)
(8192 / sizeof(ItemIdData)))))))", File: "index.c", Line: 2482)

Will look more into it, but thought this might be useful for others to
spot the problem.

Thanks,
Pavan

P.S BTW, you would need to connect as user "jjanes" to a database
"jjanes" to see the offending table.

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-02-07 09:04:17 Re: [COMMITTERS] pgsql: Fast promote mode skips checkpoint at end of recovery.
Previous Message Simon Riggs 2013-02-07 08:41:39 Re: [COMMITTERS] pgsql: Fast promote mode skips checkpoint at end of recovery.