Re: race condition in pg_class

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Smolkin Grigory <smallkeen(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject: Re: race condition in pg_class
Date: 2024-08-20 08:59:45
Message-ID: 63c27c17-0d8f-4ede-a624-3ee41d28b539@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17/08/2024 07:07, Noah Misch wrote:
> On Fri, Aug 16, 2024 at 12:26:28PM +0300, Heikki Linnakangas wrote:
>> On 14/07/2024 20:48, Noah Misch wrote:
>>> + * ... [any slow preparation not requiring oldtup] ...
>>> + * heap_inplace_update_scan([...], &tup, &inplace_state);
>>> + * if (!HeapTupleIsValid(tup))
>>> + * elog(ERROR, [...]);
>>> + * ... [buffer is exclusive-locked; mutate "tup"] ...
>>> + * if (dirty)
>>> + * heap_inplace_update_finish(inplace_state, tup);
>>> + * else
>>> + * heap_inplace_update_cancel(inplace_state);
>>
>> I wonder if the functions should be called "systable_*" and placed in
>> genam.c rather than in heapam.c. The interface looks more like the existing
>> systable functions. It feels like a modularity violation for a function in
>> heapam.c to take an argument like "indexId", and call back into systable_*
>> functions.
>
> Yes, _scan() and _cancel() especially are wrappers around systable. Some API
> options follow. Any preference or other ideas?
>
> ==== direct s/heap_/systable_/ rename [option 1]
>
> systable_inplace_update_scan([...], &tup, &inplace_state);
> if (!HeapTupleIsValid(tup))
> elog(ERROR, [...]);
> ... [buffer is exclusive-locked; mutate "tup"] ...
> if (dirty)
> systable_inplace_update_finish(inplace_state, tup);
> else
> systable_inplace_update_cancel(inplace_state);
>
> ==== make the first and last steps more systable-like [option 2]
>
> systable_inplace_update_begin([...], &tup, &inplace_state);
> if (!HeapTupleIsValid(tup))
> elog(ERROR, [...]);
> ... [buffer is exclusive-locked; mutate "tup"] ...
> if (dirty)
> systable_inplace_update(inplace_state, tup);
> systable_inplace_update_end(inplace_state);
>
> ==== no systable_ wrapper for middle step, more like CatalogTupleUpdate [option 3]
>
> systable_inplace_update_begin([...], &tup, &inplace_state);
> if (!HeapTupleIsValid(tup))
> elog(ERROR, [...]);
> ... [buffer is exclusive-locked; mutate "tup"] ...
> if (dirty)
> heap_inplace_update(relation,
> systable_inplace_old_tuple(inplace_state),
> tup,
> systable_inplace_buffer(inplace_state));
> systable_inplace_update_end(inplace_state);

My order of preference is: 2, 1, 3.

>> Could we just stipulate that you must always hold LOCKTAG_TUPLE when you
>> call heap_update() on pg_class or pg_database? That'd make the rule simple.
>
> We could. That would change more code sites. Rough estimate:
>
> $ git grep -E CatalogTupleUpd'.*(class|relrelation|relationRelation)' | wc -l
> 23
>
> If the count were 2, I'd say let's simplify the rule like you're exploring.
> (I originally had a complicated rule for pg_database, but I abandoned that
> when it helped few code sites.) If it were 100, I'd say the complicated rule
> is worth it. A count of 23 makes both choices fair.

Ok.

How many of those for RELKIND_INDEX vs tables? I'm thinking if we should
always require a tuple lock on indexes, if that would make a difference.

> Long-term, I hope relfrozenxid gets reimplemented with storage outside
> pg_class, removing the need for inplace updates. So the additional 23 code
> sites might change back at a future date. That shouldn't be a big
> consideration, though.
>
> Another option here would be to preface that README section with a simplified
> view, something like, "If a warning brought you here, take a tuple lock. The
> rest of this section is just for people needing to understand the conditions
> for --enable-casserts emitting that warning." How about that instead of
> simplifying the rules?

Works for me. Or perhaps the rules could just be explained more
succinctly. Something like:

-----
pg_class heap_inplace_update_scan() callers: before the call, acquire a
lock on the relation in mode ShareUpdateExclusiveLock or stricter. If
the update targets a row of RELKIND_INDEX (but not
RELKIND_PARTITIONED_INDEX), that lock must be on the table, locking the
index rel is not necessary. (This allows VACUUM to overwrite per-index
pg_class while holding a lock on the table alone.)
heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE in
InplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple it
overwrites.

pg_class heap_update() callers: before copying the tuple to modify, take
a lock on the tuple, or a ShareUpdateExclusiveLock or stricter on the
relation.

SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.
Most heap_update() callers already hold a suitable lock on the relation
for other reasons, and can skip the tuple lock. If you do acquire the
tuple lock, release it immediately after the update.

pg_database: before copying the tuple to modify, all updaters of
pg_database rows acquire LOCKTAG_TUPLE. (Few updaters acquire
LOCKTAG_OBJECT on the database OID, so it wasn't worth extending that as
a second option.)
-----

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2024-08-20 09:14:44 Re: Asynchronous MergeAppend
Previous Message Michael Paquier 2024-08-20 08:53:24 Re: Remaining reference to _PG_fini() in ldap_password_func