Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.

From: "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
Date: 2024-09-16 14:30:35
Message-ID: 77032579-4dc3-4552-9a09-30aaa114c144@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 13.09.2024 18:20, Fujii Masao wrote:
>
> If I understand correctly, restartpoints_timed and restartpoints_done were
> separated because a restartpoint can be skipped. restartpoints_timed counts
> when a restartpoint is triggered by a timeout, whether it runs or not,
> while restartpoints_done only tracks completed restartpoints.
>
> Similarly, I believe checkpoints should be handled the same way.
> Checkpoints can also be skipped when the system is idle, but currently,
> num_timed counts even the skipped ones, despite its documentation stating
> it's the "Number of scheduled checkpoints that have been performed."
>
> Why not separate num_timed into something like checkpoints_timed and
> checkpoints_done to reflect these different counters?

+1
This idea seems quite tenable to me.

There is a small clarification. Now if there were no skipped restartpoints then
restartpoints_done will be equal to restartpoints_timed + restartpoints_req.
Similar for checkpoints.
So i tried to introduce num_done counter for checkpoints in the patch attached.

I'm not sure should we include testing for the case when num_done is less than
num_timed + num_requested to the regress tests. I haven't been able to get it in a short time yet.

E.g. such a case may be obtained when an a error "checkpoints are
occurring too frequently" as follows:
-set checkpoint_timeout = 30 and checkpoint_warning = 40 in the postgresql.conf
-start server
-do periodically bulk insertions in the 1st client (e.g. insert into test values (generate_series(1,1E7));)
-watch for pg_stat_checkpointer in the 2nd one:
# SELECT CURRENT_TIME; select * from pg_stat_checkpointer;
# \watch

After some time, in the log will appear:
2024-09-16 16:38:47.888 MSK [193733] LOG: checkpoints are occurring too frequently (13 seconds apart)
2024-09-16 16:38:47.888 MSK [193733] HINT: Consider increasing the configuration parameter "max_wal_size".

And num_timed + num_requested will become greater than num_done.

Would be nice to find some simpler and faster way.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v1-0001-Introduce-num_done-counter-in-the-pg_stat_checkpointer.patch text/x-patch 10.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-09-16 14:33:20 Re: Add contrib/pg_logicalsnapinspect
Previous Message Alvaro Herrera 2024-09-16 14:22:03 Re: Psql meta-command conninfo+