Re: BUG #17245: Index corruption involving deduplicated entries

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, David Rowley <dgrowley(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-10-28 22:39:02
Message-ID: CAH2-Wzn3oMzc+ReTyFB6N77o3ip65CB3gNcF4NVQkvgSN+DXRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Oct 28, 2021 at 2:31 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> This makes me wonder if the issue could be that we're loosing writes / that
> something is reading old page versions (e.g. due to filesystem bug). If both
> heap and index are vacuumed, but the index write is lost, this'd be what we
> see, right?

Right, but that just doesn't seem to fit. That was the first question I asked.

> Another way this could happen is if we got the wrong relation size for either
> index or table, and a vacuum scan doesn't scan the whole table or index.

I doubt that, since the heap blocks involved include heap block 0. On
the table/indexes actually affected by this, the indexes are riddled
with corruption. But every other table seems fine (at least as far as
anybody knows).

> I've not yet read the whole thread, but if not done, it seems like a good idea
> to use pg_waldump and grep for changes to the relevant heap / index
> pages. That might give us more information about what could have happened.

I think that there is a fairly high likelihood that that alone will be
enough to diagnose the bug.

> There were a fair bit of changes around the separation between heap and index
> vacuuming in 14. I wonder if there's potentially something broken around
> repeatedly vacuuming the heap without doing index vacuums or such.

I did ask myself that question earlier today, but quickly rejected the
idea. There is very little mechanism involved with that stuff. It's
very hard to imagine what could break. The code for this in
lazy_vacuum() is quite simple.

> It's also possible that there's something wrong in that darned path that
> handles recently-dead tuples.

That sounds much more likely to me.

--
Peter Geoghegan

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-10-28 22:48:31 Re: BUG #17245: Index corruption involving deduplicated entries
Previous Message Peter Geoghegan 2021-10-28 22:31:19 Re: BUG #17245: Index corruption involving deduplicated entries