BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: bench(at)silentmedia(dot)com
Subject: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Date: 2022-02-08 20:41:54
Message-ID: 17401-9df851bb16dde397@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17401
Logged by: Ben Chobot
Email address: bench(at)silentmedia(dot)com
PostgreSQL version: 12.9
Operating system: Linux (Ubuntu)
Description:

This bug is is almost identical to BUG #17389, which I filed blaming
pg_repack; however, further testing shows the same symptoms using vanilla
REINDEX TABLE CONCURRENTLY.

1. Put some data in a table with a single btree index:
create table public.simple_test (id int primary key);
insert into public.simple_test(id) (select generate_series(1,1000));

2. Set up streaming replication to a secondary db.

3. In a loop on the primary, concurrently REINDEX that table:
while `true`; do psql -tAc "select now(),relfilenode from pg_class where
relname='simple_test_pkey'" >> log; psql -tAc "reindex table concurrently
public.simple_test"; done

4. In a loop on the secondary, have psql query the secondary db for an
indexed value of that table:
while `true`; do psql -tAc "select count(*) from simple_test where id='3';
select relfilenode from pg_class where relname='simple_test_pkey'" || break;
done; date

With those 4 steps, the client on the replica will reliably fail to open the
OID of the index within 30 minutes of looping. ("ERROR: could not open
relation with OID 6715827") When we run the same client loop on the primary
instead of the replica, or if we reindex without the CONCURRENTLY clause,
then the client loop will run for hours without fail, but neither of those
workarounds are options for us in production.

Like I said before, this isn't a new problem - we've seen it since at least
9.5 - but pre-12 we saw it using pg_repack, which is an easy (and
reasonable) scapegoat. But now that we've upgraded to 12 and are still
seeing it using vanilla concurrent reindexing, it seems more clear this is
an actual postgres bug?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2022-02-08 21:23:34 Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Previous Message Tom Lane 2022-02-08 19:19:53 Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0