Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Michael Paquier <michael(at)paquier(dot)xyz>, Петър Славов <pet(dot)slavov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY
Date: 2022-05-24 19:01:33
Message-ID: 20220524190133.j6ee7zh4f5edt5je@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2022-05-24 23:38:07 +0500, Andrey Borodin wrote:
>
>
> > On 24 May 2022, at 23:15, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > With fsync=on, it's much harder to reproduce.
> That exaplains why it's easier to reproduce on MacOS: it seem it ignores fsync.

Yea, one needs wal_sync_method=fsync_writethrough or such :/

> > On 24 May 2022, at 23:15, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > I suspect the problem might be related to pruning done during the validation
> > scan. Once PROC_IN_SAFE_IC is set, the backend itself will not preserve tids
> > its own snapshot might need. Which will wreak havoc during the validation
> > scan.
>
> I observe that removing PROC_IN_SAFE_IC for index_validate() fixes tests.
> But why it's not a problem for index_build() scan?

I now suspect it's a problem for both, just more visible for index_validate().

> And I do not understand why it's a problem that tuple is pruned during the scan... How does this "wreak havoc" happen?

Basically snapshots don't work anymore. If PROC_IN_SAFE_IC is set, that
backend is ignored for the horizon computation for snapshots / on-access HOT
pruning. Which means that rows that are visible to the snapshot can be pruned
away.

One might think that could be safe, after all the row is invisible to all
other backends. The problem is that the validation scan won't see *newer* rows
either, since they're not visible to the snapshot either. And if the new row
version is a HOT tuple, it won't have made an index entry on its own. Boom,
corruption.

Basically:

1) S1 builds index in phase 2
2) S2 inserts tuple t1 (not in the index built in 1), since it's inserted
after that)
3) S2 hot updates tuple t1->t2
4) S1 sets PROC_IN_SAFE_IC, builds snapshot, starts validation scan (phase 3)
5) S2 hot updates tuple t2->t3
6) Either S1 or S2 performs hot pruning, redirecting t1 to t3, this is only
possible because PROC_IN_SAFE_IC caused S2's ->xmin to be ignored
7) S2 checks t1->t3, finds that t3 is too new for the snapshot, doesn't create
an index entry
8) corruption

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Janes 2022-05-24 19:44:03 Re: BUG #17494: High demand for displacement sort
Previous Message PG Bug reporting form 2022-05-24 18:56:27 BUG #17496: to_char function resets if interval exceeds 23 hours 59 minutes