Re: BUG #17245: Index corruption involving deduplicated entries

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, David Rowley <dgrowley(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-10-29 08:10:41
Message-ID: CAFh8B=ndiT4fW6DRf16TWnw3ur84RYLoNc1WYwzE7LS71H99zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I think we experienced something similar.

Now a few words about our setup:
- AWS, i3.8xlarge
- Ubuntu 18.04
- ext4
- It is a shared database, with 8 clusters in total
- Size of each cluster ~1TB
- Each cluster produces ~3TB of WAL every day (plenty of UPDATEs, about 90%
of which are HOT updates).

Corruption was found on all shards, but the list of affected indexes a bit
varies from shard to shard.

Database schema:
- mostly PRIMARY or UNIQUE keys
- a couple of non-unique btree indexes
- plenty of foreign keys

The timeline:
2021-10-11 - we did the major upgrade from 9.6 to 14
2021-10-14 - executed reindexdb -a --concurrently, which finished
successfully. In order to speed up reindexing we were using PGOPTIONS="-c
maintenance_work_mem=64GB -c max_parallel_maintenance_workers=4"
2021-10-25 - I noticed that some of the indexes are corrupted, and these
are mostly UNIQUE indexes on int and/or bigint.

After that, I identified affected indexes with amcheck, found and removed
duplicated rows, and run pg_repack on affected tables. The pg_repack was
running with max_parallel_maintenance_workers=0

Since we keep an archive of WALs and backups only for the past 6 days it
would not be possible to find respective files that produced the corruption.

As of today (2021-10-29), amcheck doesn't report any problems.

I hope this information could give you some hints.

Regards,
--
Alexander Kukushkin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kamigishi Rei 2021-10-29 08:12:07 Re: BUG #17245: Index corruption involving deduplicated entries
Previous Message Marek Läll 2021-10-29 08:07:28 Re: BUG #17240: <timestamptz> at time zone ... ; wrong result