Re: Missing LWLock protection in pgstat_reset_replslot()

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Missing LWLock protection in pgstat_reset_replslot()
Date: 2024-12-05 04:26:38
Message-ID: Z1Erfnw4lFsdT5L4@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 04, 2024 at 03:20:03PM +0000, Bertrand Drouvot wrote:
> I need to think more about it but it seems to me that those values make sense,
> so maybe we should drop the entry for this particular case (shmem_exit()).

The values reported in the central hash table make sense to me. What
does not is that we could hold in the local cache of the process doing
the peeks references to stats from a different slot when doing drops
and creates that would reuse the same stats key.

I have added in the backends some logs to get into the details of the
sequence with this specific test regarding the generation markup and
the refcount (feel free to use the 0002 attached, grep for "key
4/0/0" in your logs by replaying the test), and here some details
based on the two permutations in the test sent:
1) session 1: pg_create_logical_replication_slot(), does
pgstat_init_entry() with generation=0 refcount=1. Then does
pgstat_acquire_entry_ref, refcount++ now refcount=2.
2) session 2: pg_logical_slot_peek_binary_changes(), calls
pgstat_acquire_entry_ref(), refcount++ now refcount=3. Note that the
session keeps a hold on the replslot stats.
3) session 1: pg_drop_replication_slot(), calls
pgstat_release_entry_ref(), refcount-- now refcount=2. The object
cannot be dropped in pgstat_drop_entry_internal() as session 2 still
holds a reference at it has peeked at the previous stats, keeping a
local reference.
4) Second permutation begins. session 1 calls
pg_create_logical_replication_slot() and does pgstat_reinit_entry()
generation++ refcount++, generation=1 refcount=3 as the previous entry
of session 2 is still there. pgstat_acquire_entry_ref() is called
again, refcount++ now at 4.
5) session 2, does again pg_logical_slot_peek_binary_changes(), does
not interact with the central pgstats as it holds locally a reference
to the slot's entry, based on its previous generation=0, so its
pgstat_get_entry_ref() gets the stats reference from its local cache.
The mistake is here: at this stage session 2 should refresh its local
reference so as it points to the new generation entry.
6) session 1, pg_drop_replication_slot(), calls
pgstat_release_entry_ref() and refcount-- now at 3. It cannot drop the
entry as refcount>0.
7) session 2 shuts down, pgstat_release_entry_ref is called,
refcount-- now refcount=2. pgstat_release_entry_ref() is called a
second time, refcount-- now refcount=1, cannot drop the stats entry as
the generation is not the same, as the local reference comes from the
previous generation=0, current generation=1. As it stands, the entry
should not be dropped as it's been reinitialized once. The data in
the dshash is correct, but session 2 is doing the wrong thing and
should:
- Hold a local reference of generation 1.
- Drop the stats entry as it is the last one holding it.

> While I agree that your test case does produce a failed assertion, I don't
> think it's linked to b36fbd9f8d (which focused on retrieving consistent stats).

That's unrelated to b36fbd9f8d. The assertion can be reached after
818119afccd3. Before this commit, based on your test, the second
session doing the peeks would try to drop the stats entry of a
replication slot that has been re-created when it shuts down, which is
incorrect, because its local reference is from the first creation, and
it tries to drop the entry of the second creation.

The assertion is right to complain here at shutdown when writing the
stats: we shouldn't try to drop this entry. I've spent a few hours
checking the whole, and we are doing the right things with
gc_request_count, pgStatSharedRefAge and the local cache refresh, as
I've thought first that we were not aggressive with this part of the
cache resets. The problem, as far as I can see, is that
pgstat_gc_entry_refs() did not get the call that it needs to think
about refreshing its local references when an entry updates its
generation number; it only checks if the entry has been dropped, the
generation check is equally important after a reinit because the local
reference is also outdated.
--
Michael

Attachment Content-Type Size
0001-Fix-cleanup-of-local-cache-pgstats-entries-for-entry.patch text/x-diff 1.9 KB
0002-Add-various-logs-in-pgstats-to-debug-entry-handling.patch text/x-diff 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-12-05 04:26:53 Re: Cannot find a working 64-bit integer type on Illumos
Previous Message Thomas Munro 2024-12-05 04:23:08 Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)