From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | exclusion(at)gmail(dot)com |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used |
Date: | 2023-07-26 02:29:23 |
Message-ID: | 20230726.112923.27361680552823861.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
At Tue, 25 Jul 2023 13:00:00 +0300, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote in
> Hi Tom,
>
> 21.07.2023 22:21, Tom Lane wrote:
> > Yes, we certainly want to do that during LockRelationOid. But what
> > seems to be happening here is an inval while we are closing/unlocking
> > the catalog we got the syscache entry from. That is, the expected
> > behavior here is:
> >
> > SearchSysCacheExists:
> >
> > * is entry present-and-valid?
> > No, so...
> >
> > * open and lock relevant catalog (with possible inval)
> >
> > * scan catalog, find desired row, create valid syscache entry
> >
> > * close and unlock catalog
> >
> > * return success
> >
> > SearchSysCache1 (from pg_class_aclmask_ext):
> >
> > * is entry present-and-valid?
> > Yes, so increment its refcount and return it
> >
> > There is no inval in the entry-already-present code path in syscache
> > lookup. So if we are seeing this failure, ISTM it must mean that an
> > inval is happening during "close and unlock catalog", which seems like
> > something that we don't want. But I've not traced exactly how that
> > happens.
>
> Yes, but here we deal with -DCATCACHE_FORCE_RELEASE (added to
> config_env
> on prion), so the cache entry, that was just found in
> SearchSysCacheExists(), is removed immediately because of
> SearchSysCacheExists() -> ReleaseSysCache(tuple) ->
> ReleaseCatCache(tuple).
>
> So, while the construction "if (SearchSysCacheExists())
> ... SearchSysCache1()"
> seems robust for normal conditions, it might be broken when catcache
I agree about the safety of the construct.
> entries
> released forcefully. Thus, if the worst consequence of the issue is
> sporadic
> test failures on prion, then may be fix it in a least invasive way (on
> level 1).
> 1) test xmlmap fails sporadically due to the catalog changes caused by
> parallel tests activity
> 2) schema_to_xmlschemaX() can fail when parallel workers are used
> 3) has_table_privilegeX() can fail sporadically when executed within a
> parallel worker
Doesn't this imply that the function isn't parallel-safe? The issue is
gone by marking it and all variants as parallel-restricted. It seems
to be a reasolable way to address this issue.
> 4) SearchSysCacheX(RELOID, ...) can switch to a newer catalog snapshot,
> when repeated in a parallel worker
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-07-26 02:41:57 | Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used |
Previous Message | Andres Freund | 2023-07-25 16:27:43 | Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker |