Re: race condition in pg_class

From: Noah Misch <noah(at)leadboat(dot)com>
To: Smolkin Grigory <smallkeen(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: race condition in pg_class
Date: 2023-10-27 18:48:32
Message-ID: 20231027184832.9b.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 26, 2023 at 09:44:04PM -0700, Noah Misch wrote:
> On Wed, Oct 25, 2023 at 01:39:41PM +0300, Smolkin Grigory wrote:
> > We are running PG13.10 and recently we have encountered what appears to be
> > a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and
> > some other catalog-writer, possibly ANALYZE.
> > The problem is that after successfully creating index on relation (which
> > previosly didnt have any indexes), its pg_class.relhasindex remains set to
> > "false", which is illegal, I think.

It's damaging. The table will behave like it has no indexes. If something
adds an index later, old indexes will reappear, corrupt, having not received
updates during the relhasindex=false era. ("pg_amcheck --heapallindexed" can
detect this.)

> > Index was built using the following statement:
> > ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);
>
> This is going to be a problem with any operation that does a transactional
> pg_class update without taking a lock that conflicts with ShareLock. GRANT
> doesn't lock the table at all, so I can reproduce this in v17 as follows:
>
> == session 1
> create table t (c int);
> begin;
> grant select on t to public;
>
> == session 2
> alter table t add primary key (c);
>
> == back in session 1
> commit;
>
>
> We'll likely need to change how we maintain relhasindex or perhaps take a lock
> in GRANT.

The main choice is accepting more DDL blocking vs. accepting inefficient
relcache builds. Options I'm seeing:

=== "more DDL blocking" option family

B1. Take ShareUpdateExclusiveLock in GRANT, REVOKE, and anything that makes
transactional pg_class updates without holding some stronger lock. New
asserts could catch future commands failing to do this.

B2. Take some shorter-lived lock around pg_class tuple formation, such that
GRANT blocks CREATE INDEX, but two CREATE INDEX don't block each other.
Anything performing a transactional update of a pg_class row would acquire
the lock in exclusive mode before fetching the old tuple and hold it till
end of transaction. relhasindex=true in-place updates would acquire it
the same way, but they would release it after the inplace update. I
expect a new heavyweight lock type, e.g. LOCKTAG_RELATION_DEFINITION, with
the same key as LOCKTAG_RELATION. This has less blocking than the
previous option, but it's more complicated to explain to both users and
developers.

B3. Have CREATE INDEX do an EvalPlanQual()-style thing to update all successor
tuple versions. Like the previous option, this would require new locking,
but the new lock would not need to persist till end of xact. It would be
even more complicated to explain to users and developers. (If this is
promising enough to warrant more detail, let me know.)

B4. Use transactional updates to set relhasindex=true. Two CREATE INDEX
commands on the same table would block each other. If we did it the way
most DDL does today, they'd get "tuple concurrently updated" failures
after the blocking ends.

=== "inefficient relcache builds" option family

R1. Ignore relhasindex; possibly remove it in v17. Relcache builds et
al. will issue more superfluous queries.

R2. As a weird variant of the previous option, keep relhasindex and make all
transactional updates of pg_class set relhasindex=true pessimistically.
(VACUUM will set it back to false.)

=== other

O1. This is another case where the sometimes-discussed "pg_class_nt" for
nontransactional columns would help. I'm ruling that out as too hard to
back-patch.

Are there other options important to consider? I currently like (B1) the
most, followed closely by (R1) and (B2). A key unknown is the prevalence of
index-free tables. Low prevalence would argue in favor of (R1). In my
limited experience, they've been rare. That said, I assume relcache builds
happen a lot more than GRANTs, so it's harder to bound the damage from (R1)
compared to the damage from (B1). Thoughts on this decision?

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ryo Matsumura (Fujitsu) 2023-10-27 19:07:36 Buf fix: update-po for PGXS does not work
Previous Message Tomas Vondra 2023-10-27 18:28:09 Re: Annoying corruption in PostgreSQL.