Re: shared-memory based stats collector

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: gkokolatos(at)protonmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2021-03-10 12:47:51
Message-ID: 2f3ce712-0756-8584-6a3f-9c6fe5c43dfa@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/10 17:51, Kyotaro Horiguchi wrote:
> At Wed, 10 Mar 2021 15:20:43 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>> On 2021/03/10 12:10, Kyotaro Horiguchi wrote:
>>> Agreed. The code moved to the original place and added the crash
>>> handling code. And I added a phrase to the comment.
>>> + * Was it the archiver? If exit status is zero (normal) or one (FATAL
>>> + * exit), we assume everything is all right just like normal backends
>>> + * and just try to restart a new one so that we immediately retry
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> + * archiving of remaining files. (If fail, we'll try again in future
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>> "of" of "archiving of remaining" should be replaced with "the", or removed?
>
> Either will do. I don't mind turning the gerund (archiving) into a
> gerund phrase (archiving remaining files).
>
>> Just for record. Previously LogChildExit() was called and the following LOG
>> message was output when the archiver reported FATAL error. OTOH the patch
>> prevents that and the following LOG message is not output at FATAL exit of
>> archiver. But I don't think that the following message is required in that
>> case
>> because FATAL message indicating the similar thing is already output.
>> Therefore, I'm ok with the patch.
>>
>> LOG: archiver process (PID 46418) exited with exit code 1
>
> Yeah, that's the same behavor with wal receiver.
>
>>>> I read v50_003 patch.
>>>>
>>>> When archiver dies, ProcGlobal->archiverLatch should be reset to NULL,
>>>> like walreceiver does the similar thing in WalRcvDie()?
>>> Differently from walwriter and checkpointer, archiver as well as
>>> walreceiver may die while server is running. Leaving the latch pointer
>>> alone may lead to nudging a wrong process that took over the same
>>> procarray slot. Added pgarch_die() to do that.
>>
>> Thanks!
>>
>> + if (IsUnderPostmaster && ProcGlobal->archiverLatch)
>> + SetLatch(ProcGlobal->archiverLatch);
>>
>> The latch can be reset to NULL in pgarch_die() between the if-condition and
>> SetLatch(), and which would be problematic. Probably we should protect
>> the access to the latch by using spinlock, like we do for walreceiver's latch?
>
> Ugg. Right. I remember about that bug. I moved the archiverLatch out
> of ProcGlobal to a dedicate local struct PgArch and placed a spinlock
> together.
>
> Thanks for the review! v52 is attached.

Thanks! I applied minor and cosmetic changes to the 0003 patch as follows.
Attached is the updated version of the 0003 patch. Barring any objection,
I will commit this patch.

-#include "storage/latch.h"
-#include "storage/proc.h"

I removed these because they are no longer necessary.

<literal>logical replication worker</literal>,
<literal>parallel worker</literal>, <literal>background writer</literal>,
<literal>client backend</literal>, <literal>checkpointer</literal>,
+ <literal>archiver</literal>,
<literal>startup</literal>, <literal>walreceiver</literal>,
<literal>walsender</literal> and <literal>walwriter</literal>.

In the document about pg_stat_activity, possible values in backend_type
column are all listed. I added "archiver" into the list.

BTW, those values were originally listed in alphabetical order,
but ISTM that they were gotten out of order at a certain moment.
So they should be listed in alphabetical order again. This should
be implemented as a separate patch.

-PgArchData *PgArch = NULL;
+static PgArchData *PgArch = NULL;

I marked PgArchData as static because it's used only in pgarch.c.

- ShmemInitStruct("Archiver ", PgArchShmemSize(), &found);
+ ShmemInitStruct("Archiver Data", PgArchShmemSize(), &found);

I found that the original shmem name ended with unnecessary space character.
I replaced it with "Archiver Data".

In reaper(), I moved the code block for archiver to the original location.

I ran pgindent for the files that the patch modifies.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
v52-0003-Make-archiver-process-an-auxiliary-process_fujii.patch text/plain 20.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2021-03-10 13:06:16 Re: Change JOIN tutorial to focus more on explicit joins
Previous Message Peter Smith 2021-03-10 12:47:23 Re: [HACKERS] logical decoding of two-phase transactions