| From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> | 
|---|---|
| To: | exclusion(at)gmail(dot)com | 
| Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Subject: | Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used | 
| Date: | 2023-07-26 02:29:23 | 
| Message-ID: | 20230726.112923.27361680552823861.horikyota.ntt@gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
At Tue, 25 Jul 2023 13:00:00 +0300, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote in 
> Hi Tom,
> 
> 21.07.2023 22:21, Tom Lane wrote:
> > Yes, we certainly want to do that during LockRelationOid.  But what
> > seems to be happening here is an inval while we are closing/unlocking
> > the catalog we got the syscache entry from.  That is, the expected
> > behavior here is:
> >
> > SearchSysCacheExists:
> >
> >    * is entry present-and-valid?
> >      No, so...
> >
> >    * open and lock relevant catalog (with possible inval)
> >
> >    * scan catalog, find desired row, create valid syscache entry
> >
> >    * close and unlock catalog
> >
> >    * return success
> >
> > SearchSysCache1 (from pg_class_aclmask_ext):
> >
> >    * is entry present-and-valid?
> >      Yes, so increment its refcount and return it
> >
> > There is no inval in the entry-already-present code path in syscache
> > lookup.  So if we are seeing this failure, ISTM it must mean that an
> > inval is happening during "close and unlock catalog", which seems like
> > something that we don't want.  But I've not traced exactly how that
> > happens.
> 
> Yes, but here we deal with -DCATCACHE_FORCE_RELEASE (added to
> config_env
> on prion), so the cache entry, that was just found in
> SearchSysCacheExists(), is removed immediately because of
> SearchSysCacheExists() ->  ReleaseSysCache(tuple) ->
> ReleaseCatCache(tuple).
> 
> So, while the construction "if (SearchSysCacheExists())
> ... SearchSysCache1()"
> seems robust for normal conditions, it might be broken when catcache
I agree about the safety of the construct.
> entries
> released forcefully. Thus, if the worst consequence of the issue is
> sporadic
> test failures on prion, then may be fix it in a least invasive way (on
> level 1).
> 1) test xmlmap fails sporadically due to the catalog changes caused by
>  parallel tests activity
> 2) schema_to_xmlschemaX() can fail when parallel workers are used
> 3) has_table_privilegeX() can fail sporadically when executed within a
>  parallel worker
Doesn't this imply that the function isn't parallel-safe? The issue is
gone by marking it and all variants as parallel-restricted. It seems
to be a reasolable way to address this issue.
> 4) SearchSysCacheX(RELOID, ...) can switch to a newer catalog snapshot,
>  when repeated in a parallel worker
regards.
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2023-07-26 02:41:57 | Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used | 
| Previous Message | Andres Freund | 2023-07-25 16:27:43 | Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker |