Re: Enhancing Memory Context Statistics Reporting

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enhancing Memory Context Statistics Reporting
Date: 2024-12-24 20:37:46
Message-ID: CAH2L28sfqiY6VzzUesZbWQL5Ypkuhs=m5U-taV4W8Gwjg0NPLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

>
> I'd just remove that. I agree it might have been useful with the single
> chunk of shared memory, but I think with separate chunks it's not very
> useful. And if we can end up with multiple processed getting the same
> pgprocno I guess we have way bigger problems, this won't fix that.
>

OK, fixed accordingly in the attached patch.

> >
> > > In the patch, since the timeout was set to a high value, pgbench
> ended
> > > up stuck
> > > waiting for the timeout to occur. The failure happens less
> frequently
> > > after I added an
> > > additional check for the process's existence, but it cannot be
> > entirely
> > > avoided. This is because a process can terminate after we check
> > for its
> > > existence but
> > > before it signals the client. In such cases, the client will not
> > receive
> > > any signal.
> > >
> >
> > Hmmm, I see. I guess there's no way to know if a process responds to
> us,
> > but I guess it should be possible to wake up regularly and check if
> the
> > process still exists? Wouldn't that solve the case you mentioned?
> >
> > I have fixed it accordingly in the attached patch by waking up after
> > every 5 seconds
> > to check if the process exists and sleeping again if the wake-up
> condition
> > is not satisfied. The number of such tries is limited to 20. So, the
> > total wait
> > time can be 100 seconds. I will make the re-tries configurable, inline
> > with your
> > suggestion to be able to override the default waiting time.
> >
>
> Makes sense, although 100 seconds seems a bit weird, it seems we usually
> pick "natural" values like 60s, or multiples of that. But if it's
> configurable, that's not a huge issue.
>
> Could the process wake up earlier than the timeout, say if it gets EINT
> signal? That'd break the "total timeout is 100 seconds", and it would be
> better to check that explicitly. Not sure if this can happen, though.
>
> Not sure, I will check again. According to the comment on WaitLatch, a
process
waiting on it should only wake up when timeout happens or SetLatch is
called.

> One thing I'd maybe consider is starting with a short timeout, and
> gradually increasing it until e.g. 5 seconds (or maybe just 1 second
> would be perfectly fine, IMHO). With the current coding it means we
> either get the response right away, or wait 5+ seconds. That's a big
> huge jump. If we start e.g. with 10ms, and then gradually multiply it by
> 1.2, it means we only wait "0-20% extra" on average.
>
> But perhaps this is very unlikely and not worth the complexity.
>

OK, Currently I have changed it to always wait for signal from backend or
timeout
before checking the exit condition. This is to ensure that a backend gets
a chance to publish the new statistics since I am retaining the old
statistics
due to reasons explained below. I will experiment with setting a shorter
timeout
and gradually increasing it.

> In the case of checkpointer, I also see some wait time after running the
> > tests that you mentioned, but it eventually completes the request in my
> > runs.
> >
>
> OK, but why should it even wait that long? Surely the checkpointer
> should be able to report memory contexts too?

The checkpointer responds to requests promptly when the requests are
sequential.
However, a timeout may occur if concurrent requests for memory statistics
are
sent to the checkpointer.

In this case, one client sends a GetMemoryContext signal to the
checkpointer.
The checkpointer sets the PublishMemoryContextPending flag to true in the
handler for this signal. This flag remains true until a
CHECK_FOR_INTERRUPTS
is called, which processes the interrupt and clears the flag.

If another process concurrently sends a GetMemoryContext signal to the
checkpointer
before the CHECK_FOR_INTERRUPTS is called for the previous signal, the
PublishMemoryContextPending flag will already be set to true. When the
CHECK_FOR_INTERRUPTS is eventually called by the checkpointer, it processes
both
requests and dumps its memory context statistics.

However, only one of the two waiting clients gets to read the statistics.
This is because
the first client that gains access to the shared statistics reads the data
and frees the
DSA memory after it is done. As a result, the second client keeps waiting
until it times out,
since the checkpointer has already processed its request and sent the
statistics
which the second client never gets to read.

I believe that retaining the DSAs with the latest statistics after each
request would
help resolve the issue of request timeouts in scenarios with concurrent
requests.
I have included this in the attached patch.

> and the other processes were waiting for 107145 in some way. But yeah,
> detecting the dead process would improve this, although it also shows
> the issues can "spread" easily.
>
> OTOH it's unlikely to have multiple pg_get_process_memory_contexts()
> queries pointing at each other like this - monitoring will just to that
> from one backend, and that's it. So not a huge issue.
>
> Makes sense.

> .
> >
> > In such cases, it might be necessary to experiment with the waiting
> > times at the receiving
> > client. Making the waiting time user-configurable, as you suggested, by
> > passing it as an
> > argument to the function, could help address this scenario.
> > Thanks for highlighting this, I will test this some more.
> >
>
> I think we should try very hard to make this work well without the user
> having to mess with the timeouts. These are exceptional conditions that
> happen only very rarely, which makes it hard to find good values.
>
> OK.

> I'm entirely unconcerned about the pg_get_process_memory_contexts()
> performance, within some reasonable limits. It's something executed
> every now and then - no one is going to complain it takes 10ms extra,
> measure tps with this function, etc.
>
> 17-26% seems surprisingly high, but Even 256kB is too much, IMHO. I'd
> just get rid of this optimization until someone complains and explains
> why it's worth it.
>
> Yes, let's make it fast, but I don't think we should optimize it at the
> expense of "regular workload" ...
>
>
After debugging the concurrent requests timeout issue, it appears there is
yet another
argument in favor of avoiding the recreation of DSAs for every request: we
get to retain
the last reported statistics for a given postgres process, which can help
prevent certain
requests to fail in case of concurrent requests to the same process.

Thank you,
Rahila Syed

Attachment Content-Type Size
v8-0001-Function-to-report-memory-context-stats-of-any-backe.patch application/octet-stream 40.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-24 23:38:32 Re: stored procedures vs pg_stat_statements
Previous Message Michail Nikolaev 2024-12-24 19:39:23 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements