Re: Enhancing Memory Context Statistics Reporting

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enhancing Memory Context Statistics Reporting
Date: 2024-11-28 23:23:57
Message-ID: CAH2L28uvYHX1CfmbcB9JPzR6+MzJ+kb9ob+NvFy3wUHM8HUkGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

Thank you for the review.

>
>
> 1) I read through the thread, and in general I agree with the reasoning
> for removing the file part - it seems perfectly fine to just dump as
> much as we can fit into a buffer, and then summarize the rest. But do we
> need to invent a "new" limit here? The other places logging memory
> contexts do something like this:
>
> MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
>
> Which means we only print the 100 memory contexts at the top, and that's
> it. Wouldn't that give us a reasonable memory limit too?
>
> I think this prints more than 100 memory contexts, since 100 denotes the
max_level
and contexts at each level could have upto 100 children. This limit seems
much higher than
what I am currently storing in DSA which is approx. 7000 contexts. I will
verify this again.

> 2) I see the function got renamed to pg_get_process_memory_contexts(),
> bu the comment still says pg_get_remote_backend_memory_contexts().
>
> Fixed

>
> 3) I don't see any SGML docs for this new function. I was a bit unsure
> what the "summary" argument is meant to do. The comment does not explain
> that either.
>
> Added docs.
Intention behind adding a summary argument is to report statistics of
contexts at level 0
and 1 i.e TopMemoryContext and its immediate children.

4) I wonder if the function needs to return PID. I mean, the caller
> knows which PID it is for, so it seems rather unnecessary.
>
> Perhaps it can be used to ascertain that the information indeed belongs to
the requested pid.

5) In the "summary" mode, it might be useful to include info about how
> many child contexts were aggregated. It's useful to know whether there
> was 1 child or 10000 children. In the regular (non-summary) mode it'd
> always be "1", probably, but maybe it'd interact with the limit in (1).
> Not sure.
>
> Sure, I will add this in the next iteration.

>
> 6) I feel a bit uneasy about the whole locking / communication scheme.
> In particular, I'm worried about lockups, deadlocks, etc. So I decided
> to do a trivial stress-test - just run the new function through pgbench
> with many clients.
>
> The memstats.sql script does just this:
>
> SELECT * FROM pg_get_process_memory_contexts(
> (SELECT pid FROM pg_stat_activity
> WHERE pid != pg_backend_pid()
> ORDER BY random() LIMIT 1)
> , false);
>
> where the inner query just picks a PID for some other backend, and asks
> for memory context stats for that.
>
> And just run it like this on a scale 1 pgbench database:
>
> pgbench -n -f memstats.sql -c 10 test
>
> And it gets stuck *immediately*. I've seen it to wait for other client
> backends and auxiliary processes like autovacuum launcher.
>
> This is absolutely idle system, there's no reason why a process would
> not respond almost immediately.

In my reproduction, this issue occurred because the process was terminated
while the requesting backend was waiting on the condition variable to be
signaled by it. I don’t see any solution other than having the waiting
client
backend timeout using ConditionVariableTimedSleep.

In the patch, since the timeout was set to a high value, pgbench ended up
stuck
waiting for the timeout to occur. The failure happens less frequently after
I added an
additional check for the process's existence, but it cannot be entirely
avoided. This is because a process can terminate after we check for its
existence but
before it signals the client. In such cases, the client will not receive
any signal.

I wonder if e.g. autovacuum launcher may
> not be handling these requests, or what if client backends can wait in a
> cycle.

Did not see a cyclic wait in client backends due to the pgbench stress test.

>
> 7) I've also seen this error:
>
> pgbench: error: client 6 script 0 aborted in command 0 query 0: \
> ERROR: can't attach the same segment more than once
>
I haven't investigated it, but it seems like a problem handling errors,
> where we fail to detach from a segment after a timeout.

Thanks for the hint, fixed by adding a missing call to dsa_detach after
timeout.

>
> > I opted for DSAs over DSMs to enable memory reuse by freeing
> > segments for subsequent statistics copies of the same backend,
> > without needing to recreate DSMs for each request.
>
> I feel like this might be a premature optimization - I don't have a
> clear idea how expensive it is to create DSM per request, but my
> intuition is that it's cheaper than processing the contexts and
> generating the info.
>
> I'd just remove that, unless someone demonstrates it really matters. I
> don't really worry about how expensive it is to process a request
> (within reason, of course) - it will happen only very rarely. It's more
> important to make sure there's no overhead when no one asks the backend
> for memory context info, and simplicity.
>
> Also, how expensive it is to just keep the DSA "just in case"? Imagine
> someone asks for the memory context info once - isn't it a was to still
> keep the DSA? I don't recall how much resources could that be.
>
> I don't have a clear opinion on that, I'm more asking for opinions.

Imagining a tool that periodically queries the backends for statistics,
it would be beneficial to avoid recreating the DSAs for each call.
Currently, DSAs of size 1MB per process
(i.e., a maximum of 1MB * (MaxBackends + auxiliary processes))
would be created and pinned for subsequent reporting. This size does
not seem excessively high, even for approx 100 backends and
auxiliary processes.

> 8) Two minutes seems pretty arbitrary, and also quite high. If a timeout
> is necessary, I think it should not be hard-coded.
>
> Not sure which is the ideal value. Changed it to 15 secs and added a
#define as of now.
Something that gives enough time for the process to respond but
does not hold up the client for too long would be ideal. 15 secs seem to
be not enough for github CI tests, which fail with timeout error with this
setting.

PFA an updated patch with the above changes.

Attachment Content-Type Size
v6-Function-to-report-memory-context-stats-of-any-backe.patch application/octet-stream 35.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-28 23:57:08 Re: More CppAsString2() in psql's describe.c
Previous Message Tom Lane 2024-11-28 22:44:06 Re: Added prosupport function for estimating numeric generate_series rows