Re: race condition in pg_class

From: Noah Misch <noah(at)leadboat(dot)com>
To: Nitin Motiani <nitinmotiani(at)google(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Smolkin Grigory <smallkeen(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject: Re: race condition in pg_class
Date: 2024-09-03 21:23:29
Message-ID: 20240903212329.61.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 03, 2024 at 09:24:52PM +0530, Nitin Motiani wrote:
> On Sat, Aug 31, 2024 at 6:40 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote:
> > > On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > - In the cancel case, call both systable_inplace_update_cancel and
> > > > systable_inplace_update_end. _finish or _cancel would own unlock, while
> > > > _end would own systable_endscan().
> > > >
> > > What happens to CacheInvalidateHeapTuple() in this approach? I think
> > > it will still need to be brought to the genam.c layer if we are
> > > releasing the lock in systable_inplace_update_finish.

> understanding is that the code in this approach would look like below
> :
> if (dirty)
> systable_inplace_update_finish(inplace_state, tup);
> else
> systable_inplace_update_cancel(inplace_state);
> systable_inplace_update_end(inplace_state);
>
> And that in this structure, both _finish and _cancel will call
> heap_inplace_unlock and then _end will call systable_endscan. So even
> with this structure, the invalidation has to happen inside _finish
> after the unlock.

Right.

> So this also pulls the invalidation to the genam.c
> layer. Am I understanding this correctly?

Compared to the v9 patch, the "call both" alternative would just move the
systable_endscan() call to a new systable_inplace_update_end(). It wouldn't
move anything across the genam:heapam boundary.
systable_inplace_update_finish() would remain a thin wrapper around a heapam
function.

> > > > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
> > > > tolerable now, this gets less attractive after the inplace160 patch from
> > > > https://postgr.es/m/flat/20240523000548(dot)58(dot)nmisch(at)google(dot)com
> > > >
> > > I skimmed through the inplace160 patch. It wasn't clear to me why this
> > > becomes less attractive with the patch. I see there is a new
> > > CacheInvalidateHeapTupleInPlace but that looks like it would be called
> > > while holding the lock. And then there is an
> > > AcceptInvalidationMessages which can perhaps be moved to the genam.c
> > > layer too. Is the concern that one invalidation call will be in the
> > > heapam layer and the other will be in the genam layer?
> >
> > That, or a critical section would start in heapam.c, then end in genam.c.
> > Current call tree at inplace160 v4:

> How about this alternative?
>
> genam.c:systable_inplace_update_finish
> PreInplace_Inval
> START_CRIT_SECTION
> heapam.c:heap_inplace_update
> BUFFER_LOCK_UNLOCK
> AtInplace_Inval
> END_CRIT_SECTION
> UnlockTuple
> AcceptInvalidationMessages

> Looking at inplace160, it seems that the start of the critical section
> is right after PreInplace_Inval. So why not pull START_CRIT_SECTION
> and END_CRIT_SECTION out to the genam.c layer?

heap_inplace_update() has an elog(ERROR) that needs to happen outside any
critical section. Since the condition for that elog deals with tuple header
internals, it belongs at the heapam layer more than the systable layer.

> Alternatively since
> heap_inplace_update is commented as a subroutine of
> systable_inplace_update_finish, should everything just be moved to the
> genam.c layer? Although it looks like you already considered and
> rejected this approach.

Calling XLogInsert(RM_HEAP_ID) in genam.c would be a worse modularity
violation than the one that led to the changes between v8 and v9. I think
even calling CacheInvalidateHeapTuple() in genam.c would be a worse modularity
violation than the one attributed to v8. Modularity would have the
heap_inplace function resemble heap_update() handling of invals.

> If the above alternatives are not possible, it's probably fine to go
> ahead with the current patch with the function renamed to
> heap_inplace_update_and_unlock (or something similar) as mentioned
> earlier?

I like that name. The next version will use it.

> > > I see that all the places we check resultRelInfo->ri_needLockTagTuple,
> > > we can just call
> > > IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
> > > big advantage of storing a separate bool field? Also there is another
> >
> > No, not a big advantage. I felt it was more in line with the typical style of
> > src/backend/executor.

> just find it a smell that there are two fields which are not
> independent and they go out of sync. And that's why my preference is
> to not have a dependent field unless there is a specific advantage.

Got it. This check happens for every tuple of every UPDATE, so performance
may be a factor. Some designs and their merits:

==== a. ri_needLockTagTuple
Performance: best: check one value for nonzero
Drawback: one more value lifecycle to understand
Drawback: users of ResultRelInfo w/o InitResultRelInfo() could miss this

==== b. call IsInplaceUpdateRelation
Performance: worst: two extern function calls, then compare against two values

==== c. make IsInplaceUpdateRelation() and IsInplaceUpdateOid() inline, and call
Performance: high: compare against two values
Drawback: unlike catalog.c peers
Drawback: extensions that call these must recompile if these change

==== d. add IsInplaceUpdateRelationInline() and IsInplaceUpdateOidInline(), and call
Performance: high: compare against two values
Drawback: more symbols to understand
Drawback: extensions might call these, reaching the drawback of (c)

I think my preference order is (a), (c), (d), (b). How do you see it?

> But even for the
> ri_TrigDesc, CatalogOpenIndexes() still sets it to NULL. So shouldn't
> ri_needLockTagTuple also be set to a default value of false?

CatalogOpenIndexes() explicitly zero-initializes two fields and relies on
makeNode() zeroing for dozens of others. Hence, omitting the initialization
fits the function's local convention better than including it. (PostgreSQL
has no policy or dominant practice about redundant zero-initialization.)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-09-03 21:47:57 Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message Tom Lane 2024-09-03 21:19:14 Re: [PATCH] Add min/max aggregate functions to BYTEA