Re: race condition in pg_class

From: Nitin Motiani <nitinmotiani(at)google(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Smolkin Grigory <smallkeen(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject: Re: race condition in pg_class
Date: 2024-09-04 15:30:32
Message-ID: CAH5HC96ASNkXyYVEG7pO98qr1WARGTohxGjVpBsu4c3hkM_OcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 4, 2024 at 2:53 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
>
> > So this also pulls the invalidation to the genam.c
> > layer. Am I understanding this correctly?
>
> Compared to the v9 patch, the "call both" alternative would just move the
> systable_endscan() call to a new systable_inplace_update_end(). It wouldn't
> move anything across the genam:heapam boundary.
> systable_inplace_update_finish() would remain a thin wrapper around a heapam
> function.
>

Thanks for the clarification.

> > > > > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
> > > > > tolerable now, this gets less attractive after the inplace160 patch from
> > > > > https://postgr.es/m/flat/20240523000548(dot)58(dot)nmisch(at)google(dot)com
> > > > >
> > > > I skimmed through the inplace160 patch. It wasn't clear to me why this
> > > > becomes less attractive with the patch. I see there is a new
> > > > CacheInvalidateHeapTupleInPlace but that looks like it would be called
> > > > while holding the lock. And then there is an
> > > > AcceptInvalidationMessages which can perhaps be moved to the genam.c
> > > > layer too. Is the concern that one invalidation call will be in the
> > > > heapam layer and the other will be in the genam layer?
> > >
> > > That, or a critical section would start in heapam.c, then end in genam.c.
> > > Current call tree at inplace160 v4:
>
> > How about this alternative?
> >
> > genam.c:systable_inplace_update_finish
> > PreInplace_Inval
> > START_CRIT_SECTION
> > heapam.c:heap_inplace_update
> > BUFFER_LOCK_UNLOCK
> > AtInplace_Inval
> > END_CRIT_SECTION
> > UnlockTuple
> > AcceptInvalidationMessages
>
> > Looking at inplace160, it seems that the start of the critical section
> > is right after PreInplace_Inval. So why not pull START_CRIT_SECTION
> > and END_CRIT_SECTION out to the genam.c layer?
>
> heap_inplace_update() has an elog(ERROR) that needs to happen outside any
> critical section. Since the condition for that elog deals with tuple header
> internals, it belongs at the heapam layer more than the systable layer.
>

Understood. How about this alternative then? The tuple length check
and the elog(ERROR) gets its own function. Something like
heap_inplace_update_validate or
heap_inplace_update_validate_tuple_length. So in that case, it would
look like this :

genam.c:systable_inplace_update_finish
heapam.c:heap_inplace_update_validate/heap_inplace_update_precheck
PreInplace_Inval
START_CRIT_SECTION
heapam.c:heap_inplace_update
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages

This is starting to get complicated though so I don't have any issues
with just renaming the heap_inplace_update to
heap_inplace_update_and_unlock.

> > Alternatively since
> > heap_inplace_update is commented as a subroutine of
> > systable_inplace_update_finish, should everything just be moved to the
> > genam.c layer? Although it looks like you already considered and
> > rejected this approach.
>
> Calling XLogInsert(RM_HEAP_ID) in genam.c would be a worse modularity
> violation than the one that led to the changes between v8 and v9. I think
> even calling CacheInvalidateHeapTuple() in genam.c would be a worse modularity
> violation than the one attributed to v8. Modularity would have the
> heap_inplace function resemble heap_update() handling of invals.
>

Understood. Thanks.

> > If the above alternatives are not possible, it's probably fine to go
> > ahead with the current patch with the function renamed to
> > heap_inplace_update_and_unlock (or something similar) as mentioned
> > earlier?
>
> I like that name. The next version will use it.
>

So either we go with this or try the above approach of having a
separate function _validate/_precheck/_validate_tuple_length. I don't
have a strong opinion on either of these approaches.

> > > > I see that all the places we check resultRelInfo->ri_needLockTagTuple,
> > > > we can just call
> > > > IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
> > > > big advantage of storing a separate bool field? Also there is another
> > >
> > > No, not a big advantage. I felt it was more in line with the typical style of
> > > src/backend/executor.
>
> > just find it a smell that there are two fields which are not
> > independent and they go out of sync. And that's why my preference is
> > to not have a dependent field unless there is a specific advantage.
>
> Got it. This check happens for every tuple of every UPDATE, so performance
> may be a factor. Some designs and their merits:
>

Thanks. If performance is a factor, it makes sense to keep it.

> ==== a. ri_needLockTagTuple
> Performance: best: check one value for nonzero
> Drawback: one more value lifecycle to understand
> Drawback: users of ResultRelInfo w/o InitResultRelInfo() could miss this
>
> ==== b. call IsInplaceUpdateRelation
> Performance: worst: two extern function calls, then compare against two values
>
> ==== c. make IsInplaceUpdateRelation() and IsInplaceUpdateOid() inline, and call
> Performance: high: compare against two values
> Drawback: unlike catalog.c peers
> Drawback: extensions that call these must recompile if these change
>
> ==== d. add IsInplaceUpdateRelationInline() and IsInplaceUpdateOidInline(), and call
> Performance: high: compare against two values
> Drawback: more symbols to understand
> Drawback: extensions might call these, reaching the drawback of (c)
>
> I think my preference order is (a), (c), (d), (b). How do you see it?
>

My preference order would be the same. In general I like (c) more than
(a) but recompiling extensions sounds like a major drawback so here
the preference is (a).

Can we do (a) along with some extra checks? To elaborate, execMain.c
has a function called CheckValidateRelationRel which is called by
ExecInitModifyTable in nodeModifyTable.c. Can we add the following
assert assert (or just a debug assert) in this function?

Assert(rel->ri_needsLockTagTuple == IsInplaceUpdateRelation(rel->relationDesc)

This can safeguard against users of ResultRelInfo missing this field.
An alternative might be to only do debug assertions in the functions
which use the field. But it seems simpler to just do it once in the
ExecInitModifyTable.

> > But even for the
> > ri_TrigDesc, CatalogOpenIndexes() still sets it to NULL. So shouldn't
> > ri_needLockTagTuple also be set to a default value of false?
>
> CatalogOpenIndexes() explicitly zero-initializes two fields and relies on
> makeNode() zeroing for dozens of others. Hence, omitting the initialization
> fits the function's local convention better than including it. (PostgreSQL
> has no policy or dominant practice about redundant zero-initialization.)

Thanks. Makes sense.

Thanks & Regards
Nitin Motiani
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Benoit Lobréau 2024-09-04 15:31:25 Re: Logging parallel worker draught
Previous Message Benoit Lobréau 2024-09-04 15:25:41 Re: Parallel workers stats in pg_stat_database