Re: race condition in pg_class

From: Nitin Motiani <nitinmotiani(at)google(dot)com>
To: noah(at)leadboat(dot)com
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Smolkin Grigory <smallkeen(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject: Re: race condition in pg_class
Date: 2024-08-29 15:38:43
Message-ID: CAH5HC94OEarC9=FbS0o4D5DYKODMeYjVJ_KxxcBRQFMiMAOfZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:
> > My order of preference is: 2, 1, 3.
>
> I kept tuple locking responsibility in heapam.c. That's simpler and better
> for modularity, but it does mean we release+acquire after any xmax wait.
> Before, we avoided that if the next genam.c scan found the same TID. (If the
> next scan finds the same TID, the xmax probably aborted.) I think DDL aborts
> are rare enough to justify simplifying as this version does. I don't expect
> anyone to notice the starvation outside of tests built to show it. (With
> previous versions, one can show it with a purpose-built test that commits
> instead of aborting, like the "001_pgbench_grant(at)9" test.)
>
> This move also loses the optimization of unpinning before XactLockTableWait().
> heap_update() doesn't optimize that way, so that's fine.
>
> The move ended up more like (1), though I did do
> s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). I
> felt that worked better than (2) to achieve lock release before
> CacheInvalidateHeapTuple(). Alternatives that could be fine:
>
From a consistency point of view, I find it cleaner if we can have all
the heap_inplace_lock and heap_inplace_unlock in the same set of
functions. So here those would be the systable_inplace_* functions.

> - In the cancel case, call both systable_inplace_update_cancel and
> systable_inplace_update_end. _finish or _cancel would own unlock, while
> _end would own systable_endscan().
>
What happens to CacheInvalidateHeapTuple() in this approach? I think
it will still need to be brought to the genam.c layer if we are
releasing the lock in systable_inplace_update_finish.

> - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
> tolerable now, this gets less attractive after the inplace160 patch from
> https://postgr.es/m/flat/20240523000548(dot)58(dot)nmisch(at)google(dot)com
>
I skimmed through the inplace160 patch. It wasn't clear to me why this
becomes less attractive with the patch. I see there is a new
CacheInvalidateHeapTupleInPlace but that looks like it would be called
while holding the lock. And then there is an
AcceptInvalidationMessages which can perhaps be moved to the genam.c
layer too. Is the concern that one invalidation call will be in the
heapam layer and the other will be in the genam layer?

Also I have a small question from inplace120.

I see that all the places we check resultRelInfo->ri_needLockTagTuple,
we can just call
IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
big advantage of storing a separate bool field? Also there is another
write to ri_RelationDesc in CatalogOpenIndexes in
src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
be set there also to keep it consistent with ri_RelationDesc. Please
let me know if I am missing something about the usage of the new
field.

Thanks & Regards,
Nitin Motiani
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2024-08-29 15:55:27 Re: RFC: Additional Directory for Extensions
Previous Message Tom Lane 2024-08-29 15:33:08 Re: Make printtup a bit faster