Re: Recovering from detoast-related catcache invalidations

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Xiaoran Wang <fanfuxiaoran(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Recovering from detoast-related catcache invalidations
Date: 2024-12-14 00:06:53
Message-ID: 2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13/12/2024 17:30, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
>> CatalogCacheCreateEntry() can accept catcache invalidations when it
>> opens the toast table, and it now has recheck logic to detect the case
>> that the tuple it's processing (ntp) is invalidated. However, isn't it
>> also possible that it accepts an invalidation message for a tuple that
>> we had processed in an earlier iteration of the loop? Or that a new
>> catalog tuple was inserted that should be part of the list we're building?
>
> The expectation is that the list will be built and returned to the
> caller, but it's already marked as stale so it will be rebuilt
> on next request.

Ah, you mean this at the end:

> /* mark list dead if any members already dead */
> if (ct->dead)
> cl->dead = true;

Ok, I missed that. It does not handle the 2nd scenario though: If a new
catalog tuple is concurrently inserted that should be part of the list,
it is missed.

I was able to reproduce that, by pausing a process with gdb while it's
building the list in SearchCatCacheList():

1. Create a function called foofunc(integer). It must be large so that
its pg_proc tuple is toasted.

2. In one backend, run "SELECT foofunc(1)". It calls
FuncnameGetCandidates() which calls
"SearchSysCacheList1(PROCNAMEARGSNSP, CStringGetDatum(funcname));". Put
a break point in SearchCatCacheList() just after the systable_beginscan().

3. In another backend, create function foofunc() with no args.

4. continue execution from the breakpoint.

5. Run "SELECT foofunc()" in the first session. It fails to find the
function. The error persists, it will fail to find that function if you
try again, until the syscache is invalidated again for some reason.

Attached is an injection point test case to reproduce that. If you
change the test so that the function's body is shorter, so that it's not
toasted, the test passes.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
0001-Demonstrate-catcache-list-invalidation-bug.patch text/x-patch 5.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-12-14 00:38:05 Re: Hot standby queries see transient all-zeros pages
Previous Message Tom Lane 2024-12-14 00:06:16 Re: Exceptional md.c paths for recovery and zero_damaged_pages