Re: BUG #17245: Index corruption involving deduplicated entries

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, David Rowley <dgrowley(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-11-12 02:14:48
Message-ID: CAH2-WzkG5Bu8xRw1ew2omeTbVb-7t8cSE0v54mvoTHp8WBm50g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Oct 29, 2021 at 1:10 AM Alexander Kukushkin <cyberdemn(at)gmail(dot)com> wrote:
> - Each cluster produces ~3TB of WAL every day (plenty of UPDATEs, about 90% of which are HOT updates).
>
> Corruption was found on all shards, but the list of affected indexes a bit varies from shard to shard.
>
> Database schema:
> - mostly PRIMARY or UNIQUE keys
> - a couple of non-unique btree indexes
> - plenty of foreign keys

You have said elsewhere that you're sure that this isn't the parallel
VACUUM bug, since you know that you didn't run a manual VACUUM, even
once. So I wonder what the issue might be. Since you deleted duplicate
rows from a unique index, there probably weren't very many affected
rows in total. It sounds like a pretty subtle issue to me
(particularly compared to the parallel VACUUM bug, which wasn't all
that subtle when it hit at all).

If I had to guess, I'd guess that it has something to do with the
snapshot scalability work. Specifically, a recently reported issue
involving confusion about the structure of HOT chains during pruning:

https://www.postgresql.org/message-id/flat/20211110192010.ckvfzz352hsba5xf%40alap3.anarazel.de#4c3d9c9988164f5ea3c15999bcf50ce7

Please join in on the other thread if you have anything more to add.

I could easily be wrong about that, though. You upgraded using
pg_upgrade, right? That's certainly a confounding factor here.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-12 04:22:08 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Peter Geoghegan 2021-11-12 00:58:49 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum