Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Андрей Рачицкий <therealgofman(at)mail(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18545: \dt breaks transaction, calling error when executed in SET SESSION AUTHORIZATION
Date: 2024-09-19 21:35:33
Message-ID: 2042942.1726781733@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I wrote:
> Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
>> This commit seems to trigger elog(), not reproducible in the
>> parent commit.
>> 6e086fa2e77 Allow parallel workers to cope with a newly-created session user ID.

>> postgres=# SET min_parallel_table_scan_size=0; CLUSTER pg_attribute USING pg_attribute_relid_attnum_index;
>> ERROR: pg_attribute catalog is missing 26 attribute(s) for relation OID 70321

> I've been poking at this all day, and I still have little idea what's
> going on.

Got it, after a good deal more head-scratching. Here's the relevant
parts of ParallelWorkerMain:

/*
* We've changed which tuples we can see, and must therefore invalidate
* system caches.
*/
InvalidateSystemCaches();

/*
* Restore GUC values from launching backend. We can't do this earlier,
* because GUC check hooks that do catalog lookups need to see the same
* database state as the leader.
*/
gucspace = shm_toc_lookup(toc, PARALLEL_KEY_GUC, false);
RestoreGUCState(gucspace);

...

/* Restore relmapper state. */
relmapperspace = shm_toc_lookup(toc, PARALLEL_KEY_RELMAPPER_STATE, false);
RestoreRelationMap(relmapperspace);

InvalidateSystemCaches blows away the worker's relcache. Then
RestoreGUCState causes some catalog lookups (tracing shows that
restoring default_text_search_config is what triggers this on my
setup), and in particular pg_attribute's relcache entry will get
constructed to support that. Then we wheel in a new set of
relation map entries *without doing anything about what that
might invalidate*.

In the given test case, the globally-visible relmap says that
pg_attribute's relfilenode is, say, XXXX. But we are busy rewriting
it, so the parent process has an "active" relmap entry that says
pg_attribute's relfilenode is YYYY. Given the above, the worker
process will have built a pg_attribute relcache entry that contains
XXXX, and even though it now knows YYYY is the value it should be
using, that information never makes it to the worker's relcache.

The upshot of this is that when the parallel heap scan machinery
doles out some block numbers for the parent process to read, and
some other block numbers for the worker to read, the worker is
reading those block numbers from the pre-clustering copy of
pg_attribute, which most likely doesn't match the post-clustering
image. This accounts for the missing and duplicate tuples I was
seeing in the scan output.

Of course, the reason 6e086fa2e made this visible is that before
that, any catalog reads triggered by RestoreGUCState were done
in an earlier transaction, and then we would blow away the ensuing
relcache entries in InvalidateSystemCaches. So there was no bug
as long as you assume that the "..." code doesn't cause any
catalog reads. I'm not too sure of that though --- it's certainly
not very comfortable to assume that functions like SetCurrentRoleId
and SetTempNamespaceState will never attempt a catalog lookup.

The code has another hazard too, which is that this all implies
that the GUC-related catalog lookups will be done against the
globally-visible relmap state not whatever is active in the parent
process. I have not tried to construct a POC showing that that
can give incorrect answers (that is, different from what the
parent thinks), but it seems plausible that it could.

So the fix seems clear to me: RestoreRelationMap needs to happen
before anything that could result in catalog lookups. I'm kind
of inclined to move up the adjacent restores of non-transactional
low-level stuff too, particularly RestoreReindexState which has
direct impact on how catalog lookups are done.

Independently of that, it's annoying that the parallel heap scan
machinery failed to notice that it was handing out block numbers
for two different relfilenodes. I'm inclined to see if we can
put some Asserts in there that would detect that. This particular
bug would have been far easier to diagnose that way, and it hardly
seems unlikely that "worker is reading the wrong relation" could
happen with other mistakes in future.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2024-09-19 23:37:37 Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
Previous Message Andrei Lepikhov 2024-09-19 16:47:30 Volatile functions under Memoize node

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-09-19 21:52:02 Re: Should rolpassword be toastable?
Previous Message Noah Misch 2024-09-19 21:33:46 Re: race condition in pg_class