Re: Enhancing Memory Context Statistics Reporting

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Rahila Syed <rahilasyed90(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enhancing Memory Context Statistics Reporting
Date: 2025-01-06 21:02:28
Message-ID: 5a899047-65fc-47c5-9c35-1035f44956c6@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Rahila,

Thanks for the updated and rebased patch. I've tried the pgbench test
again, to see if it gets stuck somewhere, and I'm observing this on a
new / idle cluster:

$ pgbench -n -f test.sql -P 1 test -T 60
pgbench (18devel)
progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
...

There's always a period of 10-15 seconds when everything seems to be
working fine, and then a couple seconds when it gets stuck, with the usual

LOG: Wait for 69454 process to publish stats timed out, trying again

The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
of that are processes that should be handling the signal, so how come it
gets stuck every now and then? The system is entirely idle, there's no
contention for the shmem stuff, etc. Could it be forgetting about the
signal in some cases, or something like that?

The test.sql is super simple:

SELECT * FROM pg_get_process_memory_contexts(
(SELECT pid FROM pg_stat_activity
WHERE pid != pg_backend_pid()
ORDER BY random() LIMIT 1)
, false);

Aside from this, I went through the patch to do a regular review, so
here's the main comments in somewhat random order:

1) The SGML docs talk about "contexts at level" but I don't think that's
defined/explained anywhere, there are different ways to assign levels in
a tree-like structure, so it's unclear if levels are assigned from the
top or bottom.

2) volatile sig_atomic_t PublishMemoryContextPending = false;

I'd move this right after LogMemoryContextPending (to match the other
places that add new stuff).

3) typedef enum PrintDetails

I suppose this should have some comments, explaining what the typedef is
for. Also, "details" sounds pretty generic, perhaps "destination" or
maybe "target" would be better?

4) The memcpy here seems unnecessary - the string is going to be static
in the binary, no need to copy it. In which case the whole switch is
going to be the same as in PutMemoryContextsStatsTupleStore, so maybe
move that into a separate function?

+ switch (context->type)
+ {
+ case T_AllocSetContext:
+ type = "AllocSet";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_GenerationContext:
+ type = "Generation";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_SlabContext:
+ type = "Slab";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ case T_BumpContext:
+ type = "Bump";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ default:
+ type = "???";
+ strncpy(memctx_info[curr_id].type, type, strlen(type));
+ break;
+ }

5) The comment about hash table in ProcessGetMemoryContextInterrupt
seems pretty far from hash_create(), so maybe move it.

6) ProcessGetMemoryContextInterrupt seems pretty long / complex, with
multiple nested loops, it'd be good to split it into smaller parts that
are easier to understand.

7) I'm not sure if/why we need to move MemoryContextId to memutils.h.

8) The new stuff in memutils.h is added to the wrong place, into a
section labeled "Memory-context-type-specific functions" (which it
certainly is not)

9) autovacuum.c adds the ProcessGetMemoryContextInterrupt() call after
ProcessCatchupInterrupt() - that's not wrong, but I'd move it right
after ProcessLogMemoryContextInterrupt(), just like everywhere else.

10) The pg_get_process_memory_contexts comment says:

Signal a backend or an auxiliary process to send its ...

But this is not just about the signal, it also waits for the results and
produces the result set.

11) pg_get_process_memory_contexts - Wouldn't it be better to move the
InitMaterializedSRF() call until after the privilege check, etc.?

12) The pg_get_process_memory_contexts comment should explain why it's
superuser-only function. Presumably it has similar DoS risks as the
other functions, because if not why would we have the restriction?

13) I reworded and expanded the pg_get_process_memory_contexts comment a
bit, and re-wrapped it too. But I think it also needs to explain how it
communicates with the other process (sending signal, sending data
through a DSA, ...). And also how the timeouts work.

14) I'm a bit confused about the DSA allocations (but I also haven't
worked with DSA very much, so maybe it's fine). Presumably the 16MB is
upper limit, we won't use that all the time. We allocate 1MB, but allow
it to grow up to 16MB, correct? 16MB seems like a lot, certainly enough
for this purpose - if it's not, I don't think we can come up with a
better limit.

15) In any case, I don't think the 16 should be hardcoded as a magic
constant in multiple places. That's bound to be error-prone.

16) I've reformatted / reindented / wrapped the code in various places,
to make it easier to read and more consistent with the nearby code. I
also added a bunch of comments explaining what the block of code is
meant to do (I mean, what it aims to do).

16) A comment in pg_get_process_memory_contexts says:

Pin the mapping so that it doesn't throw a warning

That doesn't seem very useful. It's not clear what kind of warning this
hides, but more importantly - we're not doing stuff to hide some sort of
warning, we do it to prevent what the warning is about.

17) pg_get_process_memory_contexts has a bunch of error cases, where we
need to detach the DSA and return NULL. Would be better to do a label
with a goto, I think.

18) I think pg_get_process_memory_contexts will have issues if this
happens in the first loop:

if ((memCtxState[procNumber].proc_id == pid) &&
DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
break;

Because then we end up with memctx_info pointing to garbage after the
loop. I don't know how hard is to hit this, I guess it can happen in
many processes calling pg_get_process_memory_contexts?

19) Minor comment and formatting of MemCtxShmemSize / MemCtxShmemInit.

20) MemoryContextInfo etc. need to be added to typedefs.list, so that
pgindent can do the right thing.

21) I think ProcessGetMemoryContextInterrupt has a bug because it uses
get_summary before reading it from the shmem.

Attached are two patches - 0001 is the original patch, 0002 has most of
my review comments (mentioned above), and a couple additional changes to
comments/formatting, etc. Those are suggestions rather than issues.

regards

--
Tomas Vondra

Attachment Content-Type Size
vtomas-0001-Function-to-report-memory-context-stats-of-an.patch text/x-patch 41.7 KB
vtomas-0002-review.patch text/x-patch 17.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-01-06 21:03:15 Re: allow changing autovacuum_max_workers without restarting
Previous Message Thomas Munro 2025-01-06 20:54:49 Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)