Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: exclusion(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Date: 2023-07-26 02:29:23
Message-ID: 20230726.112923.27361680552823861.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

At Tue, 25 Jul 2023 13:00:00 +0300, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote in
> Hi Tom,
>
> 21.07.2023 22:21, Tom Lane wrote:
> > Yes, we certainly want to do that during LockRelationOid. But what
> > seems to be happening here is an inval while we are closing/unlocking
> > the catalog we got the syscache entry from. That is, the expected
> > behavior here is:
> >
> > SearchSysCacheExists:
> >
> > * is entry present-and-valid?
> > No, so...
> >
> > * open and lock relevant catalog (with possible inval)
> >
> > * scan catalog, find desired row, create valid syscache entry
> >
> > * close and unlock catalog
> >
> > * return success
> >
> > SearchSysCache1 (from pg_class_aclmask_ext):
> >
> > * is entry present-and-valid?
> > Yes, so increment its refcount and return it
> >
> > There is no inval in the entry-already-present code path in syscache
> > lookup. So if we are seeing this failure, ISTM it must mean that an
> > inval is happening during "close and unlock catalog", which seems like
> > something that we don't want. But I've not traced exactly how that
> > happens.
>
> Yes, but here we deal with -DCATCACHE_FORCE_RELEASE (added to
> config_env
> on prion), so the cache entry, that was just found in
> SearchSysCacheExists(), is removed immediately because of
> SearchSysCacheExists() ->  ReleaseSysCache(tuple) ->
> ReleaseCatCache(tuple).
>
> So, while the construction "if (SearchSysCacheExists())
> ... SearchSysCache1()"
> seems robust for normal conditions, it might be broken when catcache

I agree about the safety of the construct.

> entries
> released forcefully. Thus, if the worst consequence of the issue is
> sporadic
> test failures on prion, then may be fix it in a least invasive way (on
> level 1).

> 1) test xmlmap fails sporadically due to the catalog changes caused by
> parallel tests activity
> 2) schema_to_xmlschemaX() can fail when parallel workers are used

> 3) has_table_privilegeX() can fail sporadically when executed within a
> parallel worker

Doesn't this imply that the function isn't parallel-safe? The issue is
gone by marking it and all variants as parallel-restricted. It seems
to be a reasolable way to address this issue.

> 4) SearchSysCacheX(RELOID, ...) can switch to a newer catalog snapshot,
> when repeated in a parallel worker

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-07-26 02:41:57 Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Previous Message Andres Freund 2023-07-25 16:27:43 Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker