From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | masao(dot)fujii(at)oss(dot)nttdata(dot)com |
Cc: | gkokolatos(at)protonmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: shared-memory based stats collector |
Date: | 2021-03-10 08:51:37 |
Message-ID: | 20210310.175137.77410390117009989.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Wed, 10 Mar 2021 15:20:43 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> On 2021/03/10 12:10, Kyotaro Horiguchi wrote:
> > Agreed. The code moved to the original place and added the crash
> > handling code. And I added a phrase to the comment.
> > + * Was it the archiver? If exit status is zero (normal) or one (FATAL
> > + * exit), we assume everything is all right just like normal backends
> > + * and just try to restart a new one so that we immediately retry
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > + * archiving of remaining files. (If fail, we'll try again in future
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> "of" of "archiving of remaining" should be replaced with "the", or removed?
Either will do. I don't mind turning the gerund (archiving) into a
gerund phrase (archiving remaining files).
> Just for record. Previously LogChildExit() was called and the following LOG
> message was output when the archiver reported FATAL error. OTOH the patch
> prevents that and the following LOG message is not output at FATAL exit of
> archiver. But I don't think that the following message is required in that
> case
> because FATAL message indicating the similar thing is already output.
> Therefore, I'm ok with the patch.
>
> LOG: archiver process (PID 46418) exited with exit code 1
Yeah, that's the same behavor with wal receiver.
> >> I read v50_003 patch.
> >>
> >> When archiver dies, ProcGlobal->archiverLatch should be reset to NULL,
> >> like walreceiver does the similar thing in WalRcvDie()?
> > Differently from walwriter and checkpointer, archiver as well as
> > walreceiver may die while server is running. Leaving the latch pointer
> > alone may lead to nudging a wrong process that took over the same
> > procarray slot. Added pgarch_die() to do that.
>
> Thanks!
>
> + if (IsUnderPostmaster && ProcGlobal->archiverLatch)
> + SetLatch(ProcGlobal->archiverLatch);
>
> The latch can be reset to NULL in pgarch_die() between the if-condition and
> SetLatch(), and which would be problematic. Probably we should protect
> the access to the latch by using spinlock, like we do for walreceiver's latch?
Ugg. Right. I remember about that bug. I moved the archiverLatch out
of ProcGlobal to a dedicate local struct PgArch and placed a spinlock
together.
Thanks for the review! v52 is attached.
Other than the archiver fix above, A bug of 0004 in handling of
replication slot stats that leads to a hang is fixed. (It's the cause
of CF-bot failure.)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v52-0001-sequential-scan-for-dshash.patch | text/x-patch | 9.2 KB |
v52-0002-Add-conditional-lock-feature-to-dshash.patch | text/x-patch | 6.2 KB |
v52-0003-Make-archiver-process-an-auxiliary-process.patch | text/x-patch | 21.0 KB |
v52-0004-Shared-memory-based-stats-collector.patch | text/x-patch | 312.5 KB |
v52-0005-Doc-part-of-shared-memory-based-stats-collector.patch | text/x-patch | 20.2 KB |
v52-0006-Remove-the-GUC-stats_temp_directory.patch | text/x-patch | 13.7 KB |
v52-0007-Exclude-pg_stat-directory-from-base-backup.patch | text/x-patch | 1.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2021-03-10 09:03:24 | Re: Procedures versus the "fastpath" API |
Previous Message | Matthias van de Meent | 2021-03-10 08:35:10 | Re: Improvements and additions to COPY progress reporting |