From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Kasahara Tatsuhito <kasahara(dot)tatsuhito(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Georgios Kokolatos <gkokolatos(at)protonmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Get memory contexts of an arbitrary backend process |
Date: | 2021-01-04 05:34:50 |
Message-ID: | 9a59401241e9269254e70c8e0ef1cc7a@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 25, 2020 at 6:08 PM Kasahara Tatsuhito
<kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
Thanks for reviewing and kind suggestion!
>> Attached a rewritten patch.
> Thanks for updating patch.
>
> But when I had applyed the patch to the current HEAD and did make, I
> got an error due to duplicate OIDs.
> You need to rebase the patch.
Assigned another OID.
>> Accordingly, I also slightly modified the basic design as below.
>>
>> ---
>> # Communication flow between the dumper and the requester
>> - (1) When requesting memory context dumping, the dumper changes
>> the struct on the shared memory state from 'ACCEPTABLE' to
>> 'REQUESTING'.
>> - (2) The dumper sends the signal to the dumper process and wait on
>> the latch.
>> - (3) When the dumper noticed the signal, it changes the state to
>> 'DUMPING'.
>> - (4) When the dumper completes dumping, it changes the state to
>> 'DONE' and set the latch.
>> - (5) The requestor reads the dump file and shows it to the user.
>> Finally, the requestor removes the dump file and reset the shared
>> memory state to 'ACCEPTABLE'.
>>
>> # Query cancellation
>> - When the requestor cancels dumping, e.g. signaling using ctrl-C,
>> the requestor changes the state of the shared memory to 'CANCELING'.
>> - The dumper checks the state when it tries to change the state to
>> 'DONE' at (4), and if the state is 'CANCELING', it initializes the
>> dump file and reset the shared memory state to 'ACCEPTABLE'.
>>
>> # Cleanup dump file and the shared memory
>> - In the normal case, the dumper removes the dump file and resets
>> the shared memory entry as described in (5).
>> - When something like query cancellation or process termination
>> happens on the dumper after (1) and before (3), in other words,
>> the state is 'REQUESTING', the requestor does the cleanup.
>> - When something happens on the dumper or the requestor after (3)
>> and before (4), in other words, the state is 'DUMPING', the dumper
>> does the cleanup. Specifically, if the requestor cancels the query,
>> it just changes the state to 'CANCELING' and the dumper notices it
>> and cleans up things later.
>> OTOH, when the dumper fails to dump, it cleans up the dump file and
>> reset the shared memory state.
>> - When something happens on the requestor after (4), i.e., the state
>> is 'DONE', the requestor does the cleanup.
>> - In the case of receiving SIGKILL or power failure, all dump files
>> are removed in the crash recovery process.
>> ---
> If the dumper is terminated before it dumps, the requestor will appear
> to enter an
> infinite loop because the status of mcxtdumpShmem will not change.
> The following are the steps to reproduce.
>
> - session1
> BEGIN; LOCK TABLE t;
> - session2
> SELECT * FROM t; -- wait
> - session3
> select pg_get_target_backend_memory_contexts(<pid of session2>);
> -- wait
> - session1
> select pg_terminate_backend(<pid of session2>); -- kill session2
> - session3 waits forever.
>
> Therefore, you may need to set mcxtdumpShmem->dump_status to
> MCXTDUMPSTATUS_CANCELING
> or other status before the dumper terminates.
In this case, it may be difficult for the dumper to change dump_status
because
it's waiting for latch and dump_memory_contexts() is not called yet.
Instead, it's possible for the requestor to check the existence of the
dumper
process periodically during waiting.
I added this logic to the attached patch.
> Also, although I have not been able to reproduce it, I believe that
> with the current design,
> if the requestor disappears right after the dumper dumps the memory
> information,
> the dump file will remain.
> Since the current design appears to allow only one requestor per
> instance, when the requestor
> requests a dump, it might be a good idea to delete any remaining dump
> files, if any.
Although I'm not sure when the dump file remains, deleting any remaining
dump
files seems good for safety.
I also added this idea to the attached patch.
> The following are comments on the code.
>
> + proc = BackendPidGetProc(dst_pid);
> +
> + if (proc == NULL)
> + {
> + ereport(WARNING,
> + (errmsg("PID %d is not a PostgreSQL server process",
> dst_pid)));
> +
> + return (Datum) 1;
> + }
> For now, background writer, checkpointer and WAL writer are belong to
> the auxiliary process.
> Therefore, if we specify the PIDs of these processes for
> pg_get_target_backend_memory_contexts(),
> "PID xxxx is not a PostgreSQL server process" whould be outoput.
> This confuses the user.
> How about use AuxiliaryPidGetProc() to determine these processes?
Thanks and I modified the patch to output the below message when it's an
auxiliary process.
| PID %d is not a PostgreSQL backend process but an auxiliary process.
>
> + ereport(INFO,
> + (errmsg("The request has failed and now PID %d is
> requsting dumping.",
> + mcxtdumpShmem->src_pid)));
> +
> + LWLockRelease(McxtDumpLock);
> You can release LWLock before ereport.
Modified to release the lock before ereport.
> + Assert(mcxtdumpShmem->dump_status = MCXTDUMPSTATUS_REQUESTING);
> typo?
> It might be "mcxtdumpShmem->dump_status == MCXTDUMPSTATUS_REQUESTING".
Ops, it's a serious typo. Fixed it.
> + ereport(INFO,
> + (errmsg("PID %d is looked up by another process",
> pid)));
> This message is not accurate.
> Because, in the current design, only one session can request a dump,
> so if there are multiple requests, this message will be output even if
> the
> request is for a PID that is not looked up.
Exactly. Modified the message as below.
| Only one session can dump at a time and another session is dumping
currently.
> + if (fpout == NULL)
> + {
> + ereport(LOG,
> + (errcode_for_file_access(),
> + errmsg("could not write memory context file \"%s\":
> %m",
> + dumpfile)));
> It would be better to use "dump file" instead of "memory context file".
>
> +static void
> +McxtReqKill(int code, Datum arg)
> The function McxtReqKill looks like it should have a different name.
> (Since it doesn't do any kill).
> How about McxtReqcCleanup or something similar?
Thanks for your idea and I changed the name to McxtReqCleanup.
Regards,
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Add-pg_get_target_backend_memory_contexts-to-coll.patch | text/x-diff | 33.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2021-01-04 05:37:41 | Re: Test harness for regex code (to allow importing Tcl's test suite) |
Previous Message | Abhijit Menon-Sen | 2021-01-04 05:21:36 | Re: pg_waldump/heapdesc.c and struct field names |