Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Ben Chobot <bench(at)silentmedia(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Date: 2022-02-11 01:38:34
Message-ID: YgW+Gl+VC+QGFZF4@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Feb 10, 2022 at 04:12:40PM -0800, Andres Freund wrote:
> I'm pretty sure the problem is on the primary. Looking through
> ReindexRelationConcurrently() I think I found at least two problems:
>
> 1) We don't WAL log snapshot conflicts, afaict (i.e. the
> WaitForOlderSnapshots() in phase 3). Which I think means that the new index
> can end up being used "too soon" in pre-existing snapshots on the standby.
>
> I don't think this is the problem this thread is about, but it's definitely a
> problem.
>
> 2) WaitForLockersMultiple() in phase 5 / 6 isn't WAL logged. Without waiting
> for sessions to see the results of Phase 4, 5, we can mark the index as dead
> (phase 5) and drop it (phase 6), while there are ongoing accesses.
>
> I think this is the likely cause of the reported bug.

Yep, I was planning to play with this problem from next week, FWIW,
just lacked time/energy to do so. And the release was shipping
anyway, so there is plenty of time.

My impression is that we don't really need to change the WAL format
based on the existing APIs we already have, or that in the worst case
it would be possible to make things backward-compatible enough that it
would not be a worry as long as the standbys are updated before the
primaries.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2022-02-11 01:54:55 Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0
Previous Message Tom Lane 2022-02-11 01:25:28 Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0