Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Петър Славов <pet(dot)slavov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Date: 2022-05-24 18:46:54
Message-ID: 20220524184654.c2zt6coy4s5a6rnh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2022-05-24 10:38:14 -0700, Peter Geoghegan wrote:
> On Tue, May 24, 2022 at 9:37 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Do we have any idea what really causes the corruption?
>
> I don't think so.

I think I found it: https://postgr.es/m/20220524183705.cmgbqq32z63qynhe%40alap3.anarazel.de
afaict PROC_IN_SAFE_IC is completely broken right now. Any concurrent prune
can remove prune rows that are visible to the snapshot held by the
PROC_IN_SAFE_IC backend. Which basically makes them "fair weather snapshots" -
they work only as long as there is no concurrent activity.

Similar behavior is fine for VACUUM - it doesn't use a snapshot / need a
consistent view of the table. But not for CIC - otherwise it could just use
SnapshotAny or such.

I don't really see a realistic alternative other than reverting at this
point. I think this needs to be rethought fairly fundamentally.

> Andrey's tap test fails for me on 14 as expected, and does so reliably
> -- so there is a fairly good reproducer for this.
>
> I don't have time to debug this right now (...), but it would probably be
> straightforward to get an RR recording of the failure.

I tried that, but it didn't repro under rr within 15min or so.

> (need to work on my pgCon talk)

Good luck :)

> > One thing that'd be worth excluding is the use of parallel index builds.
>
> I can rule out a problem with parallel index builds -- disabling them
> in the tap test doesn't alter the outcome.

Good. Just to clarify: I was suspicious of PROC_IN_SAFE_IC being set
incoherently in parallel workers or such, not of parallel index builds "in
general".

Greetings,

Andres Freund

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-05-24 18:56:27 BUG #17496: to_char function resets if interval exceeds 23 hours 59 minutes
Previous Message Andrey Borodin 2022-05-24 18:38:07 Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY