Re: Get memory contexts of an arbitrary backend process

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, tomas(dot)vondra(at)2ndquadrant(dot)com, Kasahara Tatsuhito <kasahara(dot)tatsuhito(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get memory contexts of an arbitrary backend process
Date: 2020-09-25 07:28:02
Message-ID: ab8998e5af7977133cee43f9d53abd23@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for all your comments, I updated the patch.

On Tue, Sep 1, 2020 at 12:03 AM Kasahara Tatsuhito
<kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:

> - How about managing the status of signal send/receive and dump
> operations on a shared hash or others ?
> Sending and receiving signals, dumping memory information, and
> referencing dump information all work asynchronously.
> Therefore, it would be good to have management information to
> check
> the status of each process.
> A simple idea is that ..
> - send a signal to dump to a PID, it first record following
> information into the shared hash.
> pid (specified pid)
> loc (dump location, currently might be ASAP)
> recv (did the pid process receive a signal? first false)
> dumped (did the pid process dump a mem information? first
> false)
> - specified process receive the signal, update the status in the
> shared hash, then dumped at specified location.
> - specified process finish dump mem information, update the
> status
> in the shared hash.

I added a shared hash table consisted of minimal members
mainly for managing whether the file is dumped or not.
Some members like 'loc' seem useful in the future, but I
haven't added them since it's not essential at this point.

On 2020-09-01 10:54, Andres Freund wrote:

>> CREATE VIEW pg_backend_memory_contexts AS
>> - SELECT * FROM pg_get_backend_memory_contexts();
>> + SELECT * FROM pg_get_backend_memory_contexts(-1);
>
> -1 is odd. Why not use NULL or even 0?

Changed it from -1 to NULL.

>> + rc = fwrite(&name_len, sizeof(int), 1, fpout);
>> + rc = fwrite(name, name_len, 1, fpout);
>> + rc = fwrite(&idlen, sizeof(int), 1, fpout);
>> + rc = fwrite(clipped_ident, idlen, 1, fpout);
>> + rc = fwrite(&level, sizeof(int), 1, fpout);
>> + rc = fwrite(&parent_len, sizeof(int), 1, fpout);
>> + rc = fwrite(parent, parent_len, 1, fpout);
>> + (void) rc; /* we'll check
>> for error with ferror */
>> +
>> + }
> This format is not descriptive. How about serializing to json or
> something? Or at least having field names?

Added field names for each value.

>> + while (true)
>> + {
>> + CHECK_FOR_INTERRUPTS();
>> +
>> + pg_usleep(10000L);
>> +
>
> Need better signalling back/forth here.

Added a shared hash table that has a flag for managing whether the file
is dumped or not.
I modified it to use this flag.

>> + /*
>> + * Open a temp file to dump the current memory context.
>> + */
>> + fpout = AllocateFile(tmpfile, PG_BINARY_W);
>> + if (fpout == NULL)
>> + {
>> + ereport(LOG,
>> + (errcode_for_file_access(),
>> + errmsg("could not write temporary
>> memory context file \"%s\": %m",
>> + tmpfile)));
>> + return;
>> + }
>
> Probably should be opened with O_CREAT | O_TRUNC?

AllocateFile() calls fopen() and AFAIU fopen() with mode "w" corresponds
to open() with "O_WRONLY | O_CREAT | O_TRUNC".

Do you mean I should not use fopen() here?

On 2020-09-24 13:01, Michael Paquier wrote:
> On Thu, Sep 10, 2020 at 09:11:21PM +0900, Kasahara Tatsuhito wrote:
>> I think it's fine to have an interface to delete in an emergency, but
>> I agree that
>> users shouldn't be made aware of the existence or deletion of dump
>> files, basically.
>
> Per the CF bot, the number of tests needs to be tweaked, because we
> test each entry filtered out with is_deeply(), meaning that the number
> of tests needs to be updated to reflect that if the filtered list is
> changed:
> t/010_pg_basebackup.pl ... 104/109 # Looks like you planned 109 tests
> but ran 110.
> t/010_pg_basebackup.pl ... Dubious, test returned 255 (wstat 65280,
> 0xff00)
> All 109 subtests passed
>
> Simple enough to fix.

Incremented the number of tests.

Any thoughts?

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment Content-Type Size
0002-Enabled-pg_get_backend_memory_contexts-to-collect-ar.patch text/x-diff 24.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-09-25 07:40:17 [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS
Previous Message Thomas Munro 2020-09-25 07:09:36 Re: Handing off SLRU fsyncs to the checkpointer