From: | Rahila Syed <rahilasyed90(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Enhancing Memory Context Statistics Reporting |
Date: | 2025-01-13 02:36:10 |
Message-ID: | CAH2L28vrnzctbSZ+P3wkfHV5-8G6jgcMd7MKo3C9mWZYh9+7Ng@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tomas,
Thank you for the review.
On Tue, Jan 7, 2025 at 2:32 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> Hi Rahila,
>
> Thanks for the updated and rebased patch. I've tried the pgbench test
> again, to see if it gets stuck somewhere, and I'm observing this on a
> new / idle cluster:
>
> $ pgbench -n -f test.sql -P 1 test -T 60
> pgbench (18devel)
> progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
> progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
> progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
> progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
> progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
> progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
> progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
> progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
> progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
> progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
> progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
> progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
> progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
> progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
> progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
> progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
> progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
> progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
> progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
> progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
> ...
>
> There's always a period of 10-15 seconds when everything seems to be
> working fine, and then a couple seconds when it gets stuck, with the usual
>
> LOG: Wait for 69454 process to publish stats timed out, trying again
>
> The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
> of that are processes that should be handling the signal, so how come it
> gets stuck every now and then? The system is entirely idle, there's no
> contention for the shmem stuff, etc. Could it be forgetting about the
> signal in some cases, or something like that?
>
> I am not sure as of now, I will debug further. Meanwhile, I have addressed
the
review comments. Please find the details and an updated patch below.
>
> 1) The SGML docs talk about "contexts at level" but I don't think that's
> defined/explained anywhere, there are different ways to assign levels in
> a tree-like structure, so it's unclear if levels are assigned from the
> top or bottom.
>
Fixed.
>
> 2) volatile sig_atomic_t PublishMemoryContextPending = false;
>
I'd move this right after LogMemoryContextPending (to match the other
> places that add new stuff).
>
Done.
>
> 3) typedef enum PrintDetails
>
> I suppose this should have some comments, explaining what the typedef is
> for. Also, "details" sounds pretty generic, perhaps "destination" or
> maybe "target" would be better?
>
> I added the comments above the typedef and changed the name to
PrintDestination.
> 4) The memcpy here seems unnecessary - the string is going to be static
> in the binary, no need to copy it. In which case the whole switch is
> going to be the same as in PutMemoryContextsStatsTupleStore, so maybe
> move that into a separate function?
>
> + switch (context->type)
> + {
> + case T_AllocSetContext:
> + type = "AllocSet";
> + strncpy(memctx_info[curr_id].type, type, strlen(type));
> + break;
> + case T_GenerationContext:
> + type = "Generation";
> + strncpy(memctx_info[curr_id].type, type, strlen(type));
> + break;
> + case T_SlabContext:
> + type = "Slab";
> + strncpy(memctx_info[curr_id].type, type, strlen(type));
> + break;
> + case T_BumpContext:
> + type = "Bump";
> + strncpy(memctx_info[curr_id].type, type, strlen(type));
> + break;
> + default:
> + type = "???";
> + strncpy(memctx_info[curr_id].type, type, strlen(type));
> + break;
> + }
>
Got rid of the copy and moved the switch to a separate function.
>
> 5) The comment about hash table in ProcessGetMemoryContextInterrupt
> seems pretty far from hash_create(), so maybe move it.
>
> Was fixed in your suggestions patch.
6) ProcessGetMemoryContextInterrupt seems pretty long / complex, with
> multiple nested loops, it'd be good to split it into smaller parts that
> are easier to understand.
>
> Done the refactoring to move certain parts into separate functions.
> 7) I'm not sure if/why we need to move MemoryContextId to memutils.h.
>
This is because I am referencing it from both mcxt.c and mcxtfuns.c. I can
consider moving some of the code out of mcxt.c and consolidating
everything related to this patch in mcxtfuncs.c if mcxt.c is intended to
contain only the core memory context logic.
> 8) The new stuff in memutils.h is added to the wrong place, into a
> section labeled "Memory-context-type-specific functions" (which it
> certainly is not)
>
> Fixed.
> 9) autovacuum.c adds the ProcessGetMemoryContextInterrupt() call after
> ProcessCatchupInterrupt() - that's not wrong, but I'd move it right
> after ProcessLogMemoryContextInterrupt(), just like everywhere else.
>
> Fixed too.
> 10) The pg_get_process_memory_contexts comment says:
>
> Signal a backend or an auxiliary process to send its ...
>
> But this is not just about the signal, it also waits for the results and
> produces the result set.
Makes sense, edited accordingly.
>
>
11) pg_get_process_memory_contexts - Wouldn't it be better to move the
> InitMaterializedSRF() call until after the privilege check, etc.?
>
> I have moved it after the super user check but kept it before some other
checks that lead to WARNING, after looking at how other functions have done
it.
> 12) The pg_get_process_memory_contexts comment should explain why it's
> superuser-only function. Presumably it has similar DoS risks as the
> other functions, because if not why would we have the restriction?
>
> Edited accordingly.
> 13) I reworded and expanded the pg_get_process_memory_contexts comment a
> bit, and re-wrapped it too. But I think it also needs to explain how it
> communicates with the other process (sending signal, sending data
> through a DSA, ...). And also how the timeouts work.
>
> Thank you for improving the comments. Added remaining changes as requested.
> 14) I'm a bit confused about the DSA allocations (but I also haven't
> worked with DSA very much, so maybe it's fine). Presumably the 16MB is
> upper limit, we won't use that all the time. We allocate 1MB, but allow
> it to grow up to 16MB, correct?
Yes.
16MB seems like a lot, certainly enough
> for this purpose - if it's not, I don't think we can come up with a
> better limit.
>
> I can try reducing it to 8MB, although it's expected to be only allocated
when needed.
> 15) In any case, I don't think the 16 should be hardcoded as a magic
> constant in multiple places. That's bound to be error-prone.
>
> Done.
> 16) I've reformatted / reindented / wrapped the code in various places,
> to make it easier to read and more consistent with the nearby code. I
> also added a bunch of comments explaining what the block of code is
> meant to do (I mean, what it aims to do).
>
> Thank you
> 16) A comment in pg_get_process_memory_contexts says:
>
> Pin the mapping so that it doesn't throw a warning
>
> That doesn't seem very useful. It's not clear what kind of warning this
> hides, but more importantly - we're not doing stuff to hide some sort of
> warning, we do it to prevent what the warning is about.
>
> Makes sense, fixed.
> 17) pg_get_process_memory_contexts has a bunch of error cases, where we
> need to detach the DSA and return NULL. Would be better to do a label
> with a goto, I think.
>
Done.
> 18) I think pg_get_process_memory_contexts will have issues if this
> happens in the first loop:
>
> if ((memCtxState[procNumber].proc_id == pid) &&
> DsaPointerIsValid(memCtxState[procNumber].memstats_dsa_pointer))
> break;
>
> Because then we end up with memctx_info pointing to garbage after the
> loop. I don't know how hard is to hit this, I guess it can happen in
> many processes calling pg_get_process_memory_contexts?
>
I think this is not possible since if the breaking condition is met, it
means
memstats_dsa_pointer is valid and memctx_info which resides
at mestats_dsa_pointer will contain valid data. Am I missing something?
Regarding the proc_id == pid check, I have added a comment in the code as
requested.
> 19) Minor comment and formatting of MemCtxShmemSize / MemCtxShmemInit.
>
> Ok.
20) MemoryContextInfo etc. need to be added to typedefs.list, so that
> pgindent can do the right thing.
>
> Done.
21) I think ProcessGetMemoryContextInterrupt has a bug because it uses
> get_summary before reading it from the shmem.
>
Fixed. It was not showing up in tests as the result of the bug was some
extra memory allocation
in dsa and some extra computation to populate all the paths in hash table
inspite of
get_summary being true.
>
> Attached are two patches - 0001 is the original patch, 0002 has most of
> my review comments (mentioned above), and a couple additional changes to
> comments/formatting, etc. Those are suggestions rather than issues.
>
> Thank you, applied the 0002 patch and made the changes mentioned in XXX.
Answering some of your questions in the 0002 patch below:
Q. * XXX Also, what if we fill exactly this number of contexts? Won't we
* lose the last entry because it will be overwitten by the summary?
A. We are filling 0 to max_stats - 2 slots by memory context in the loop
foreach_ptr(MemoryContextData, cur, contexts) in
ProcessGetMemoryContextInterrupt.
max_stats - 1 is reserved for the summary statistics.
Q. /* XXX I don't understand why we need to check get_summary here? */
A. get_summary check is there to ensure that the context_id is inserted in
the
hash_table if get_summary is true. If get_summary is true, the loop will
break after the first iteration
and the entire main list of contexts won't be traversed and hence
context_ids won't be inserted.
Hence it is handled separately inside a check for get_summary.
Q. /* XXX What if the memstats_dsa_pointer is not valid? Is it even
possible?
* If it is, we have garbage in memctx_info. Maybe it should be an
Assert()? */
A . Agreed. Changed it to an assert.
Q. /*
* XXX isn't 2 x 1kB for every context a bit too much? Maybe better
to
* make it variable-length?
*/
A. I don't know how to do this for a variable in shared memory, won't that
mean
allocating from the heap and thus the pointer would become invalid in
another
process?
Thank you,
Rahila Syed
Attachment | Content-Type | Size |
---|---|---|
v10-0001-Function-to-report-memory-context-stats-of-any-backe.patch | application/octet-stream | 44.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Suraj Kharage | 2025-01-13 02:56:03 | Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints |
Previous Message | Richard Guo | 2025-01-13 02:04:34 | Re: Eager aggregation, take 3 |