Re: Enhancing Memory Context Statistics Reporting

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enhancing Memory Context Statistics Reporting
Date: 2025-04-07 23:17:17
Message-ID: B5BF7FC9-D9AA-4C43-AC28-8A6599056CC0@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 7 Apr 2025, at 17:43, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2025-04-07 15:41:37 +0200, Daniel Gustafsson wrote:
>> I think this function can be a valuable debugging aid going forward.
>
> What I am most excited about for this is to be able to measure server-wide and
> fleet-wide memory usage over time. Today I have actually very little idea
> about what memory is being used for across all connections, not to speak of a
> larger number of servers.

Thanks for looking, Rahila and I took a collective stab at the review comments.

>> + before_shmem_exit(AtProcExit_memstats_dsa_free, 0);
>> +
>> SetProcessingMode(NormalProcessing);
>> }
>
> How about putting it into BaseInit()? Or maybe just register it when its
> first used?

Moved to BaseInit().

>> +MEM_CTX_PUBLISH "Waiting for a process to publish memory information."
>
> The memory context stuff abbreviates as cxt not ctx. There's a few more cases
> of that in the patch.

I never get that right. Fixed.

>> + return (context_type);
>
> Why these parens?

Must be a leftover from something, fixed. Sorry about that.

>> + * If the publishing backend does not respond before the condition variable
>> + * times out, which is set to MEMSTATS_WAIT_TIMEOUT, retry given that there is
>> + * time left within the timeout specified by the user, before giving up and
>> + * returning previously published statistics, if any. If no previous statistics
>> + * exist, return NULL.
>
> Why do we need to repeatedly wake up rather than just sleeping with the
> "remaining" amount of time based on the time the function was called and the
> time that has passed since?

Fair point, the current coding was a conversion from the previous retry-based
approach but your suggestion is clearly correct. There is still potential for
refactoring but at this point I don't want to change too much all at once.

>> + * A valid DSA pointer isn't proof that statistics are available, it can
>> + * be valid due to previously published stats.
>
> Somehow "valid DSA pointer" is a bit too much about the precise mechanics and
> not enough about what's actually happening. I'd rather say something like
>
> "Even if the proc has published statistics, they may not be due to the current
> request, but previously published stats."

Agreed, thats better. Changed.

>> + if (!IsUnderPostmaster)
>> + {
>> + Assert(!found);
>
> I don't really understand why this uses IsUnderPostmaster? Seems like this
> should just use found like most (or all) the other *ShmemInit() functions do?

Agreed, Fixed.

>> + LWLockInitialize(&memCtxArea->lw_lock, LWLockNewTrancheId());
>
> I think for builtin code we just hardcode the tranches in BuiltinTrancheIds.

Fixed.

> It feels a bit silly to duplicate the call to context->methods->stats three
> times. We've changed these parameters a bunch in the past, having more callers
> to fix makes that more work. Can't the switch just set up the args that are
> then passed to one call to context->methods->stats?

I don't disagree, but I prefer to do that as a separate refactoring to not
change too many things all at once.

>> +
>> + /* Compute the number of stats that can fit in the defined limit */
>> + max_stats = (MAX_SEGMENTS_PER_BACKEND * DSA_DEFAULT_INIT_SEGMENT_SIZE)
>> + / (MAX_MEMORY_CONTEXT_STATS_SIZE);
>
> MAX_SEGMENTS_PER_BACKEND sounds way too generic to me for something defined in
> memutils.h. I don't really understand why DSA_DEFAULT_INIT_SEGMENT_SIZE is
> something that makes sense to use here?

Renamed, and dependency on DSA_DEFAULT_INIT_SEGMENT_SIZE removed.

>> + /*
>> + * Hold the process lock to protect writes to process specific memory. Two
>> + * processes publishing statistics do not block each other.
>> + */
>
> s/specific/process specific/

That's what it says though.. isn't it? I might be missing something obvious.

>> + dsa_free(area, memCtxState[idx].memstats_dsa_pointer);
>> + memCtxState[idx].memstats_dsa_pointer = InvalidDsaPointer;
>
> Both callers to free_memorycontextstate_dsa() do these lines immediately after
> calling free_memorycontextstate_dsa(), why not do that inside?

Fixed.

>> + /* Copy statistics to DSA memory */
>> + PublishMemoryContext(meminfo, context_id, cur, path, stat, 1, area, 100);
>> + }
>> + else
>> + {
>> + /* Examine the context stats */
>> + memset(&stat, 0, sizeof(stat));
>> + (*cur->methods->stats) (cur, NULL, NULL, &stat, true);
>
> But do we really do it twice in a row? The lines are exactly the same, so it
> seems that should just be done before the if?

Fixed.

>> +signal_memorycontext_reporting(void)
>
> IMO somewhat confusing to release the lock in a function named
> signal_memorycontext_reporting(). Why do we do that after
> hash_destroy()/dsa_detach()?

The function has been renamed for clarity.

>> + /* context id starts with 1 */
>> + entry->context_id = ++(*stats_count);
>
> Given that we don't actually do anything here relating to starting with 1, I
> find that comment confusing.

Reworded, not sure if it's much better tbh.

>> + memctx_info[curr_id].name = dsa_allocate0(area, namelen + 1);
>
> Given the number of references to memctx_info[curr_id] I'd put it in a local variable.

I might be partial, but I sort of prefer this way since it makes the underlying
data structure clear to the reader.

> Why is this a dsa_allocate0 given that we're immediately overwriting it?

It doesn't need to be zeroed as it's immediately overwritten. Fixed.

>> + memctx_info[curr_id].ident = dsa_allocate0(area, idlen + 1);
>> + identptr = (char *) dsa_get_address(area, memctx_info[curr_id].ident);
>> + strlcpy(identptr, ident, idlen + 1);
>
> Hm. First I thought we'd leak memory if this second (and subsequent)
> dsa_allocate failed. Then I thought we'd be ok, because the memory would be
> memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.
>
> But I think it wouldn't *quite* work, because memCtxState[idx].total_stats is
> only set *after* we would have failed.

Keeping a running total in .total_stats should make the leak window smaller.

>> + memctx_info[curr_id].type = ContextTypeToString(context->type);
>
> I don't think this works across platforms. On windows / EXEC_BACKEND builds
> the location of string constants can differ across backends. And: Why do we
> need the string here? You can just call ContextTypeToString when reading?

Correct, we can just store the type and call ContextTypeToString when
generating the tuple. Fixed.

>> +/*
>> + * Free the memory context statistics stored by this process
>> + * in DSA area.
>> + */
>> +void
>> +AtProcExit_memstats_dsa_free(int code, Datum arg)
>> +{
>
> FWIW, to me the fact that it does a dsa_free() is an implementation
> detail. It's also not the only thing this does.

Renamed.

> And, I don't think AtProcExit* really is accurate, given that it runs *before*
> shmem is cleaned up?
>
> I wonder if the best approach here wouldn't be to forgo the use of a
> before_shmem_exit() callback, but instead use on_dsm_detach(). That would
> require we'd not constantly detach from the dsm segment, but I don't
> understand why we do that in the first place?

The attach/detach has been removed.

>> + /* If the dsm mapping could not be found, attach to the area */
>> + if (dsm_seg != NULL)
>> + return;
>
> I don't understand what we do here with the dsm? Why do we not need cleanup
> if we are already attached to the dsm segment?

Fixed.

>> +} MemoryContextState;
>
> IMO that's too generic a name for something in a header.
>
>> +} MemoryContextId;
>
> This too. Particularly because MemoryContextData->ident exist but is
> something different.

Renamed both to use MemoryContextReporting* namespace, which leaves
MemoryContextReportingBackendState at an unwieldly long name. I'm running out
of ideas on how to improve and it does make purpose quite explicit at least.

>> + from pg_get_process_memory_contexts(launcher_pid, false, 20)
>> + where path = '{1}' into r;
>> + RAISE NOTICE '%', r;
>> + select type, name, ident
>> + from pg_get_process_memory_contexts(pg_backend_pid(), false, 20)
>> + where path = '{1}' into r;
>> + RAISE NOTICE '%', r;
>> +END $$;
>
> I'd also test an aux process. I think the AV launcher isn't one, because it
> actually does "table" access of shared relations.

Fixed, switched from the AV launcher.

--
Daniel Gustafsson

Attachment Content-Type Size
v27-0001-Add-function-to-get-memory-context-stats-for-pro.patch application/octet-stream 65.5 KB
unknown_filename text/plain 1 byte

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2025-04-07 23:18:56 Re: [PATCH] Automatic client certificate selection support for libpq v1
Previous Message Michael Paquier 2025-04-07 23:00:48 Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits