From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Missing LWLock protection in pgstat_reset_replslot() |
Date: | 2024-12-05 07:11:54 |
Message-ID: | Z1FSOj66GaJUTbVi@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, Dec 05, 2024 at 01:26:38PM +0900, Michael Paquier wrote:
> On Wed, Dec 04, 2024 at 03:20:03PM +0000, Bertrand Drouvot wrote:
> > I need to think more about it but it seems to me that those values make sense,
> > so maybe we should drop the entry for this particular case (shmem_exit()).
>
> The values reported in the central hash table make sense to me.
Yeah.
> What
> does not is that we could hold in the local cache of the process doing
> the peeks references to stats from a different slot when doing drops
> and creates that would reuse the same stats key.
>
> I have added in the backends some logs to get into the details of the
> sequence with this specific test regarding the generation markup and
> the refcount (feel free to use the 0002 attached, grep for "key
> 4/0/0" in your logs by replaying the test), and here some details
> based on the two permutations in the test sent:
Agree.
I had created about the same 0002 and was seeing the exact same behavior that
lead to my previous message. BTW, that's a good idea to share your 0002, I'll keep
in mind to do the same in the future (for ease of discussion).
> The assertion is right to complain here at shutdown when writing the
> stats: we shouldn't try to drop this entry.
Yes.
> I've spent a few hours
> checking the whole, and we are doing the right things with
> gc_request_count, pgStatSharedRefAge and the local cache refresh, as
> I've thought first that we were not aggressive with this part of the
> cache resets. The problem, as far as I can see, is that
> pgstat_gc_entry_refs() did not get the call that it needs to think
> about refreshing its local references when an entry updates its
> generation number; it only checks if the entry has been dropped, the
> generation check is equally important after a reinit because the local
> reference is also outdated.
I had the same diagnostic but did not reach the "what should be done" yet.
I agree with the proposal and with the fact that 818119afcc did miss to use
the "generation" in pgstat_gc_entry_refs() to also discard reinitialized entries:
- if (!entry_ref->shared_entry->dropped)
+ /*
+ * "generation" checks for the case of entries being reinitialized, and
+ * "dropped" for the case where these are.. dropped.
+ */
+ if (!entry_ref->shared_entry->dropped &&
+ pg_atomic_read_u32(&entry_ref->shared_entry->generation) ==
+ entry_ref->generation)
Yeah that seems the right thing to do for me too.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Yugo NAGATA | 2024-12-05 07:23:16 | Re: Doc: clarify the log message level of the VERBOSE option |
Previous Message | Amit Langote | 2024-12-05 06:53:32 | Re: generic plans and "initial" pruning |