Quick Links

Re: Enhancing Memory Context Statistics Reporting

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Rahila Syed <rahilasyed90(at)gmail(dot)com>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enhancing Memory Context Statistics Reporting
Date:	2025-03-25 14:14:08
Message-ID:	F23526FA-407E-484B-88BC-4CA81C7CED52@yesql.se
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On 20 Mar 2025, at 08:39, Rahila Syed <rahilasyed90(at)gmail(dot)com> wrote:

Thanks for the new version, I believe this will be a welcome tool in the
debugging toolbox.

I took a cleanup pass over the docs with among others the below changes:
* You had broken the text in paragraphs, but without <para/> tags they are
rendered as a single blob of text so added that.
* Removed the "(PID)" explanation as process id is used elsewhere on the same
page already without explanation.
* Added <productname/> markup on PostgreSQL
* Added <literal/> markup on paramater values
* Switched the example query output to use \x
* Added a note on when pg_backend_memory_contexts is a better choice
The paragraphs need some re-indenting to avoid too long lines, but I opted out
of doing so here to make reviewing the changes easier.

A few comments on the code (all comments are performed in 0003 attached here
which also has smaller cleanups wrt indentation, code style etc):

+#include <math.h>
I don't think we need this, maybe it was from an earlier version of the patch?

+MEM_CTX_PUBLISH "Waiting for backend to publish memory information."
I wonder if this should really be "process" and not backend?

+ default:
+ context_type = "???";
+ break;
In ContextTypeToString() I'm having second thoughts about this, there shouldn't
be any legitimate use-case of passing a nodetag this function which would fail
MemoryContextIsValid(). I wonder if we aren't helping callers more by erroring
out rather than silently returning an unknown? I haven't changed this but
maybe we should to set the API right from the start?

+ * if the process has more memory contexts than that can fit in the allocated
s/than that can/than what can/?

+ errmsg("memory context statistics privilege error"));
Similar checks elsewhere in the tree mostly use "permission denied to .." so I
think we should adopt that here as well.

+ LWLockAcquire(&memCtxState[procNumber].lw_lock, LW_EXCLUSIVE);
+ msecs =
+ TimestampDifferenceMilliseconds(curr_timestamp,
+ memCtxState[procNumber].stats_timestamp);
Since we only want to consider the stats if they are from the current process,
we can delay checking the time difference until after we've checked the pid and
thus reduce the amount of time we hold the lock in the error case.

+ /*
+ * Recheck the state of the backend before sleeping on the condition
+ * variable
+ */
+ proc = BackendPidGetProc(pid);
Here we are really rechecking that the process is still alive, but I wonder if
we should take the opportunity to ensure that the type is what we expect it to
be? If the pid has moved from being a backend to an aux proc or vice versa we
really don't want to go on.

+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process",
+ pid));
I wonder if we should differentiate between the warnings? When we hit this in
the loop the errmsg is describing a slightly different case. I did leave it
for now, but it's food for thought if we should perhaps reword this one.

+ ereport(LOG,
+ errmsg("Wait for %d process to publish stats timed out, trying again",
+ pid));
This should probably by DEBUG1, in a congested cluster it might cause a fair
bit of logging which isn't really helping the user. Also, nitpick, errmsg
starts with a lowercase letter.

+static Size
+MemCtxShmemSize(void)
We don't really need this function anymore and keeping it separate we risk it
going out of sync with the matching calcuation in MemCtxBackendShmemInit, so I
think we should condense into one.

else
{
+ Assert(print_location == PRINT_STATS_NONE);
Rather than an if-then-else and an assert we can use a switch statement without
a default, this way we'll automatically get a warning if a value is missed.

+ ereport(LOG,
+ errmsg("hash table corrupted, can't construct path value"));
I know you switched from elog(LOG.. to ereport(LOG.. but I still think a LOG
entry stating corruption isn't helpful, it's not actionable for the user.
Given that it's a case that shouldn't happen I wonder if we should downgrade it
to an Assert(false) and potentially a DEBUG1?

--
Daniel Gustafsson

Attachment	Content-Type	Size
v19-0001-Preparatory-changes-for-reporting-memory-context.patch	application/octet-stream	4.3 KB
v19-0002-Function-to-report-memory-context-statistics.patch	application/octet-stream	54.3 KB
v19-0003-Review-comments-and-fixups.patch	application/octet-stream	18.2 KB
unknown_filename	text/plain	2 bytes

In response to

Re: Enhancing Memory Context Statistics Reporting at 2025-03-20 07:39:45 from Rahila Syed

Responses

Re: Enhancing Memory Context Statistics Reporting at 2025-03-26 10:34:17 from Rahila Syed

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Burd, Greg	2025-03-25 14:21:14	Re: Expanding HOT updates for expression and partial indexes
Previous Message	Noah Misch	2025-03-25 14:11:20	Re: AIO v2.5