Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Date: 2023-07-21 17:20:55
Message-ID: 969017.1689960055@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> I think that we need to determine the level where the problem that should
> be fixed is:
> 1) test xmlmap fails sporadically due to the catalog changes caused by
>  parallel tests activity
> 2) schema_to_xmlschemaX() can fail when parallel workers are used
> 3) has_table_privilegeX() can fail sporadically when executed within a
>  parallel worker
> 4) SearchSysCacheX(RELOID, ...) can switch to a newer catalog snapshot,
>  when repeated in a parallel worker

Yeah, that's not immediately obvious. IIUC, the situation we are
looking at is that SearchSysCacheExists can succeed even though the
tuple we found is already dead at the instant that the function
exits (thanks to absorption of inval messages during relation_close).
The fact that that only happens in parallel workers is pure chance
really. It is not okay for has_table_privilegeX to depend on the
fact that the surrounding query already has some lock on pg_class.
So this means that the approach has_table_privilegeX uses of
assuming that successful SearchSysCacheExists means it can call
pg_class_aclcheck without fear is just broken.

If we suppose that that assumption is only being made in the
has_foo_privilege functions, then one way we could fix it is to extend
the API of pg_class_aclcheck etc to add a no-error-on-not-found flag,
and get rid of the separate SearchSysCacheExists check. However,
I can't avoid the suspicion that we have other places assuming the
same thing. So I think what we really ought to be doing is one
of two things:

1. Hack SearchSysCacheExists to account for this issue, by making it
loop if it finds a syscache entry but sees that the entry is already
dead. (We have to loop, not just return false, in case the row was
updated rather than deleted.) Maybe all the syscache lookup
functions need to do likewise; it's certainly not intuitively
reasonable for them to return already-known-stale entries.

2. Figure out how come we are executing a cache inval on the way
out of syscache entry creation, and stop that from happening.

I like #2 better if it's not hard to do cleanly. However, I'm not
quite sure how we are getting to an inval during relation close;
maybe that's not something we want to prevent.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2023-07-21 19:00:01 Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Previous Message Alexander Lakhin 2023-07-21 09:00:01 Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used