Re: Missing LWLock protection in pgstat_reset_replslot()

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Missing LWLock protection in pgstat_reset_replslot()
Date: 2024-12-05 07:11:54
Message-ID: Z1FSOj66GaJUTbVi@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Dec 05, 2024 at 01:26:38PM +0900, Michael Paquier wrote:
> On Wed, Dec 04, 2024 at 03:20:03PM +0000, Bertrand Drouvot wrote:
> > I need to think more about it but it seems to me that those values make sense,
> > so maybe we should drop the entry for this particular case (shmem_exit()).
>
> The values reported in the central hash table make sense to me.

Yeah.

> What
> does not is that we could hold in the local cache of the process doing
> the peeks references to stats from a different slot when doing drops
> and creates that would reuse the same stats key.
>
> I have added in the backends some logs to get into the details of the
> sequence with this specific test regarding the generation markup and
> the refcount (feel free to use the 0002 attached, grep for "key
> 4/0/0" in your logs by replaying the test), and here some details
> based on the two permutations in the test sent:

Agree.

I had created about the same 0002 and was seeing the exact same behavior that
lead to my previous message. BTW, that's a good idea to share your 0002, I'll keep
in mind to do the same in the future (for ease of discussion).

> The assertion is right to complain here at shutdown when writing the
> stats: we shouldn't try to drop this entry.

Yes.

> I've spent a few hours
> checking the whole, and we are doing the right things with
> gc_request_count, pgStatSharedRefAge and the local cache refresh, as
> I've thought first that we were not aggressive with this part of the
> cache resets. The problem, as far as I can see, is that
> pgstat_gc_entry_refs() did not get the call that it needs to think
> about refreshing its local references when an entry updates its
> generation number; it only checks if the entry has been dropped, the
> generation check is equally important after a reinit because the local
> reference is also outdated.

I had the same diagnostic but did not reach the "what should be done" yet.

I agree with the proposal and with the fact that 818119afcc did miss to use
the "generation" in pgstat_gc_entry_refs() to also discard reinitialized entries:

- if (!entry_ref->shared_entry->dropped)
+ /*
+ * "generation" checks for the case of entries being reinitialized, and
+ * "dropped" for the case where these are.. dropped.
+ */
+ if (!entry_ref->shared_entry->dropped &&
+ pg_atomic_read_u32(&entry_ref->shared_entry->generation) ==
+ entry_ref->generation)

Yeah that seems the right thing to do for me too.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2024-12-05 07:23:16 Re: Doc: clarify the log message level of the VERBOSE option
Previous Message Amit Langote 2024-12-05 06:53:32 Re: generic plans and "initial" pruning