Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Date: 2023-10-13 15:00:00
Message-ID: 5df89a35-a985-b287-7210-6ed7d2c31803@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Tom,

13.10.2023 00:01, Tom Lane wrote:
> I came back to this question today, and after more thought I'm going
> to cast my vote for "let's not trust SearchSysCacheExists()+lookup".
> It just seems like that's too fragile. The case reported in this
> thread of it failing in parallel workers but not main should scare
> anybody, and the future course of development seems far more likely
> to introduce new problems than remove them.

Thanks for digging into this!

> I spent some time looking through existing SearchSysCacheExists calls,
> and I could only find two sets of routines where we seem to be
> depending on SearchSysCacheExists to protect a subsequent lookup
> somewhere else, and there isn't any lock on the object in question.
> Those are the has_foo_privilege functions discussed here, and the
> foo_is_visible functions near the bottom of namespace.c. I'm not
> sure why we've not heard complaints traceable to the foo_is_visible
> family. Maybe nobody has tried hard to break them, or maybe they
> are just less likely to be used in ways that are at risk.

I'll try to research/break xxx_is_visible and share my findings tomorrow.

> It turns out to not be that hard to get rid of the problem in the
> has_foo_privilege family, as per attached patches. I've not looked
> at fixing the foo_is_visible family, but that probably ought to be a
> separate patch anyway.

Yeah, the attached patches greatly improve consistency. The only
inconsistency I've found in the patches is a missing comma here:
+ * Exported generic routine for checking a user's access privileges to an object
+ * with is_missing

You removed "this case is not a user-facing error, so elog not ereport",
and I think it's good for consistency too, as all aclmask/aclcheck
functions now use ereport().

> BTW, while nosing around I found what seems like a very nasty related
> bug. Suppose that a catalog tuple being loaded into syscache contains
> some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
> involving fetches from toast tables that will certainly cause
> AcceptInvalidationMessages calls. What if one of those should have
> invalidated this tuple? We will not notice, because it's not in
> the hashtable yet. When we do add it, we will mark it not-dead,
> meaning that the stale entry looks fine and could persist for a long
> while.

Yeah, it's an interesting case that needs investigation, IMO.
I'll try to look into this and construct the test case in the background.

> 0001's changes in acl.c are straightforward, but it's worth noting
> that the has_sequence_privilege functions hadn't gotten the memo
> about not failing when a bogus relation OID is passed.

I've looked through all functions has_*priv*_id and all they have
"if (is_missing) PG_RETURN_NULL()" now (with the patches applied).

Best regards,
Alexander

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2023-10-14 02:34:43 Re: BUG #18130: \copy fails with "could not read block" or "page should be empty but not" errors due to triggers
Previous Message PG Bug reporting form 2023-10-13 14:26:56 BUG #18156: Self-referential foreign key in partitioned table not enforced on deletes