From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Nitin Motiani <nitinmotiani(at)google(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Smolkin Grigory <smallkeen(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Lakhin <exclusion(at)gmail(dot)com> |
Subject: | Re: race condition in pg_class |
Date: | 2024-08-31 01:10:43 |
Message-ID: | 20240831011043.2b.nmisch@google.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 29, 2024 at 09:08:43PM +0530, Nitin Motiani wrote:
> On Thu, Aug 29, 2024 at 8:11 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote:
> > > My order of preference is: 2, 1, 3.
> >
> > I kept tuple locking responsibility in heapam.c. That's simpler and better
> > for modularity, but it does mean we release+acquire after any xmax wait.
> > Before, we avoided that if the next genam.c scan found the same TID. (If the
> > next scan finds the same TID, the xmax probably aborted.) I think DDL aborts
> > are rare enough to justify simplifying as this version does. I don't expect
> > anyone to notice the starvation outside of tests built to show it. (With
> > previous versions, one can show it with a purpose-built test that commits
> > instead of aborting, like the "001_pgbench_grant(at)9" test.)
> >
> > This move also loses the optimization of unpinning before XactLockTableWait().
> > heap_update() doesn't optimize that way, so that's fine.
> >
> > The move ended up more like (1), though I did do
> > s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). I
> > felt that worked better than (2) to achieve lock release before
> > CacheInvalidateHeapTuple(). Alternatives that could be fine:
> >
> From a consistency point of view, I find it cleaner if we can have all
> the heap_inplace_lock and heap_inplace_unlock in the same set of
> functions. So here those would be the systable_inplace_* functions.
That will technically be the case after inplace160, and I could make it so
here by inlining heap_inplace_unlock() into its heapam.c caller. Would that
be cleaner or less clean?
> > - In the cancel case, call both systable_inplace_update_cancel and
> > systable_inplace_update_end. _finish or _cancel would own unlock, while
> > _end would own systable_endscan().
> >
> What happens to CacheInvalidateHeapTuple() in this approach? I think
> it will still need to be brought to the genam.c layer if we are
> releasing the lock in systable_inplace_update_finish.
Cancel scenarios don't do invalidation. (Same under other alternatives.)
> > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While
> > tolerable now, this gets less attractive after the inplace160 patch from
> > https://postgr.es/m/flat/20240523000548(dot)58(dot)nmisch(at)google(dot)com
> >
> I skimmed through the inplace160 patch. It wasn't clear to me why this
> becomes less attractive with the patch. I see there is a new
> CacheInvalidateHeapTupleInPlace but that looks like it would be called
> while holding the lock. And then there is an
> AcceptInvalidationMessages which can perhaps be moved to the genam.c
> layer too. Is the concern that one invalidation call will be in the
> heapam layer and the other will be in the genam layer?
That, or a critical section would start in heapam.c, then end in genam.c.
Current call tree at inplace160 v4:
genam.c:systable_inplace_update_finish
heapam.c:heap_inplace_update
PreInplace_Inval
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages
If we hoisted all of invalidation up to the genam.c layer, a critical section
that starts in heapam.c would end in genam.c:
genam.c:systable_inplace_update_finish
PreInplace_Inval
heapam.c:heap_inplace_update
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages
If we didn't accept splitting the critical section but did accept splitting
invalidation responsibilities, one gets perhaps:
genam.c:systable_inplace_update_finish
PreInplace_Inval
heapam.c:heap_inplace_update
START_CRIT_SECTION
BUFFER_LOCK_UNLOCK
AtInplace_Inval
END_CRIT_SECTION
UnlockTuple
AcceptInvalidationMessages
That's how I ended up at inplace120 v9's design.
> Also I have a small question from inplace120.
>
> I see that all the places we check resultRelInfo->ri_needLockTagTuple,
> we can just call
> IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a
> big advantage of storing a separate bool field? Also there is another
No, not a big advantage. I felt it was more in line with the typical style of
src/backend/executor.
> write to ri_RelationDesc in CatalogOpenIndexes in
> src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to
> be set there also to keep it consistent with ri_RelationDesc. Please
> let me know if I am missing something about the usage of the new
> field.
Can you say more about consequences you found?
Only the full executor reads the field, doing so when it fetches the most
recent version of a row. CatalogOpenIndexes() callers lack the full
executor's practice of fetching the most recent version of a row, so they
couldn't benefit reading the field.
I don't think any CatalogOpenIndexes() caller passes its ResultRelInfo to the
full executor, and "typedef struct ResultRelInfo *CatalogIndexState" exists in
part to keep it that way. Since CatalogOpenIndexes() skips ri_TrigDesc and
other fields, I would expect other malfunctions if some caller tried.
Thanks,
nm
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-08-31 01:47:41 | Re: query_id, pg_stat_activity, extended query protocol |
Previous Message | Noah Misch | 2024-08-31 01:07:11 | Re: Inval reliability, especially for inplace updates |