Quick Links

Re: Logging idle checkpoints

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	sfrost(at)snowman(dot)net, vik(dot)fearing(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Logging idle checkpoints
Date:	2017-10-03 04:17:00
Message-ID:	20171003.131700.169031419.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

At Tue, 3 Oct 2017 10:23:08 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQ3Q1J_wBC7yPXk39dO0RGvbM4-nYp2gMrCJ7pfPJXcYw(at)mail(dot)gmail(dot)com>
> On Tue, Oct 3, 2017 at 12:01 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I certainly don't care for the idea of adding log messages saying we
> > aren't doing anything just to match a count that's incorrectly claiming
> > that checkpoints are happening when they aren't.
> >
> > The down-thread suggestion of keeping track of skipped checkpoints might
> > be interesting, but I'm not entirely convinced it really is. We have
> > time to debate that, of course, but I don't really see how that's
> > helpful. At the moment, it seems like the suggestion to add that column
> > is based on the assumption that we're going to start logging skipped
> > checkpoints and having that column would allow us to match up the count
> > between the new column and the "skipped checkpoint" messages in the logs
> > and I can not help but feel that this is a ridiculous amount of effort
> > being put into the analysis of something that *didn't* happen.
>
> Being able to look at how many checkpoints are skipped can be used as
> a tuning indicator of max_wal_size and checkpoint_timeout, or in short
> increase them if those remain idle.

We ususally adjust the GUCs based on how often checkpoint is
*executed* and how many of the executed checkpoints have been
triggered by xlog progress (or with shorter interval than
timeout). It seems enough. Counting skipped checkpoints gives
just a rough estimate of how long the system was getting no
substantial updates. I doubt that users get something valuable by
counting skipped checkpoints.

> Since their introduction in
> 335feca4, m_timed_checkpoints and m_requested_checkpoints track the
> number of checkpoint requests, not if a checkpoint has been actually
> executed or not, I am not sure that this should be changed after 10
> years. So, to put it in other words, wouldn't we want a way to track
> checkpoints that are *executed*, meaning that we could increment a
> counter after doing the skip checks in CreateRestartPoint() and
> CreateCheckPoint()?

This sounds reasonable to me.

CreateRestartPoint() is already returning ckpt_performed, it is
used to let checkpointer retry in 15 seconds rather than waiting
the next checkpoint_timeout. Checkpoint might deserve the same
treatment on skipping.

By the way RestartCheckPoint emits DEBUG2 messages on skipping.
Although restartpoint has different characteristics from
checkpoint, if we change the message level for CreateCheckPoint
(currently DEBUG1), CreateRestartPoint might should get the same
change. (Elsewise at least they ought to have the same message
level?)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: Logging idle checkpoints at 2017-10-03 01:23:08 from Michael Paquier

Responses

Re: Logging idle checkpoints at 2017-10-03 12:22:27 from Stephen Frost

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2017-10-03 04:21:02	Re: Conversion error
Previous Message	Tatsuo Ishii	2017-10-03 04:16:53	Re: Conversion error