Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-02-23 13:28:14
Message-ID: CAMm1aWZ07VYUPpvmV9Ck_VX-o+xH=hEBs-67scd_Us6U6FKbHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I will make use of pgstat_progress_update_multi_param() in the next
> patch to replace multiple calls to checkpoint_progress_update_param().

Fixed.
---

> > The other progress tables use [type]_total as column names for counter
> > targets (e.g. backup_total for backup_streamed, heap_blks_total for
> > heap_blks_scanned, etc.). I think that `buffers_total` and
> > `files_total` would be better column names.
>
> I agree and I will update this in the next patch.

Fixed.
---

> How about this "The checkpoint is started because max_wal_size is reached".
>
> "The checkpoint is started because checkpoint_timeout expired".
>
> "The checkpoint is started because some operation forced a checkpoint".

I have used the above description. Kindly let me know if any changes
are required.
---

> > > + <entry><literal>checkpointing CommitTs pages</literal></entry>
> >
> > CommitTs -> Commit time stamp
>
> I will handle this in the next patch.

Fixed.
---

> There are more scenarios where you can have a baackend requesting a checkpoint
> and waiting for its completion, and there may be more than one backend
> concerned, so I don't think that storing only one / the first backend pid is
> ok.

Thanks for this information. I am not considering backend_pid.
---

> I think all the information should be exposed. Only knowing why the current
> checkpoint has been triggered without any further information seems a bit
> useless. Think for instance for cases like [1].

I have supported all possible checkpoint kinds. Added
pg_stat_get_progress_checkpoint_kind() to convert the flags (int) to a
string representing a combination of flags and also checking for the
flag update in ImmediateCheckpointRequested() which checks whether
CHECKPOINT_IMMEDIATE flag is set or not. I did not find any other
cases where the flags get changed (which changes the current
checkpoint behaviour) during the checkpoint. Kindly let me know if I
am missing something.
---

> > I feel 'processes_wiating' aligns more with the naming conventions of
> > the fields of the existing progres views.
>
> There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have
> a strong opinion on it, just make sure to correct the typo.

More analysis is required to support this. I am planning to take care
in the next patch.
---

> If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a
> restartpoint if the checkpoint's timeline is different from the current
> timeline?

Fixed.

Sharing the v2 patch. Kindly have a look and share your comments.

Thanks & Regards,
Nitin Jadhav

On Tue, Feb 22, 2022 at 12:08 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
>
> > > Thank you for sharing the information. 'triggering backend PID' (int)
> > > - can be stored without any problem. 'checkpoint or restartpoint?'
> > > (boolean) - can be stored as a integer value like
> > > PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and
> > > PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as
> > > start time in stat_progress, timestamp fits in 64 bits) - As
> > > Timestamptz is of type int64 internally, so we can store the timestamp
> > > value in the progres parameter and then expose a function like
> > > 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not
> > > Timestamptz) as argument and then returns string representing the
> > > elapsed time.
> >
> > No need to use a string there; I think exposing the checkpoint start
> > time is good enough. The conversion of int64 to timestamp[tz] can be
> > done in SQL (although I'm not sure that exposing the internal bitwise
> > representation of Interval should be exposed to that extent) [0].
> > Users can then extract the duration interval using now() - start_time,
> > which also allows the user to use their own preferred formatting.
>
> The reason for showing the elapsed time rather than exposing the
> timestamp directly is in case of checkpoint during shutdown and
> end-of-recovery, I am planning to log a message in server logs using
> 'log_startup_progress_interval' infrastructure which displays elapsed
> time. So just to match both of the behaviour I am displaying elapsed
> time here. I feel that elapsed time gives a quicker feel of the
> progress. Kindly let me know if you still feel just exposing the
> timestamp is better than showing the elapsed time.
>
> > > 'checkpoint start location' (lsn = uint64) - I feel we
> > > cannot use progress parameters for this case. As assigning uint64 to
> > > int64 type would be an issue for larger values and can lead to hidden
> > > bugs.
> >
> > Not necessarily - we can (without much trouble) do a bitwise cast from
> > uint64 to int64, and then (in SQL) cast it back to a pg_lsn [1]. Not
> > very elegant, but it works quite well.
> >
> > [1] SELECT '0/0'::pg_lsn + ((CASE WHEN stat.my_int64 < 0 THEN
> > pow(2::numeric, 64::numeric)::numeric ELSE 0::numeric END) +
> > stat.my_int64::numeric) FROM (SELECT -2::bigint /* 0xFFFFFFFF/FFFFFFFE
> > */ AS my_bigint_lsn) AS stat(my_int64);
>
> Thanks for sharing. It works. I will include this in the next patch.
> On Sat, Feb 19, 2022 at 11:02 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Fri, Feb 18, 2022 at 08:07:05PM +0530, Nitin Jadhav wrote:
> > >
> > > The backend_pid contains a valid value only during
> > > the CHECKPOINT command issued by the backend explicitly, otherwise the
> > > value will be 0. We may have to add an additional field to
> > > 'CheckpointerShmemStruct' to hold the backend pid. The backend
> > > requesting the checkpoint will update its pid to this structure.
> > > Kindly let me know if you still feel the backend_pid field is not
> > > necessary.
> >
> > There are more scenarios where you can have a baackend requesting a checkpoint
> > and waiting for its completion, and there may be more than one backend
> > concerned, so I don't think that storing only one / the first backend pid is
> > ok.
> >
> > > > And also while looking at the patch I see there's the same problem that I
> > > > mentioned in the previous thread, which is that the effective flags can be
> > > > updated once the checkpoint started, and as-is the view won't reflect that. It
> > > > also means that you can't simply display one of wal, time or force but a
> > > > possible combination of the flags (including the one not handled in v1).
> > >
> > > If I understand the above comment properly, it has 2 points. First is
> > > to display the combination of flags rather than just displaying wal,
> > > time or force - The idea behind this is to just let the user know the
> > > reason for checkpointing. That is, the checkpoint is started because
> > > max_wal_size is reached or checkpoint_timeout expired or explicitly
> > > issued CHECKPOINT command. The other flags like CHECKPOINT_IMMEDIATE,
> > > CHECKPOINT_WAIT or CHECKPOINT_FLUSH_ALL indicate how the checkpoint
> > > has to be performed. Hence I have not included those in the view. If
> > > it is really required, I would like to modify the code to include
> > > other flags and display the combination.
> >
> > I think all the information should be exposed. Only knowing why the current
> > checkpoint has been triggered without any further information seems a bit
> > useless. Think for instance for cases like [1].
> >
> > > Second point is to reflect
> > > the updated flags in the view. AFAIK, there is a possibility that the
> > > flags get updated during the on-going checkpoint but the reason for
> > > checkpoint (wal, time or force) will remain same for the current
> > > checkpoint. There might be a change in how checkpoint has to be
> > > performed if CHECKPOINT_IMMEDIATE flag is set. If we go with
> > > displaying the combination of flags in the view, then probably we may
> > > have to reflect this in the view.
> >
> > You can only "upgrade" a checkpoint, but not "downgrade" it. So if for
> > instance you find both CHECKPOINT_CAUSE_TIME and CHECKPOINT_FORCE (which is
> > possible) you can easily know which one was the one that triggered the
> > checkpoint and which one was added later.
> >
> > > > > Probably a new field named 'processes_wiating' or 'events_waiting' can be
> > > > > added for this purpose.
> > > >
> > > > Maybe num_process_waiting?
> > >
> > > I feel 'processes_wiating' aligns more with the naming conventions of
> > > the fields of the existing progres views.
> >
> > There's at least pg_stat_progress_vacuum.num_dead_tuples. Anyway I don't have
> > a strong opinion on it, just make sure to correct the typo.
> >
> > > > > Probably writing of buffers or syncing files may complete before
> > > > > pg_is_in_recovery() returns false. But there are some cleanup
> > > > > operations happen as part of the checkpoint. During this scenario, we
> > > > > may get false value for pg_is_in_recovery(). Please refer following
> > > > > piece of code which is present in CreateRestartpoint().
> > > > >
> > > > > if (!RecoveryInProgress())
> > > > > replayTLI = XLogCtl->InsertTimeLineID;
> > > >
> > > > Then maybe we could store the timeline rather then then kind of checkpoint?
> > > > You should still be able to compute the information while giving a bit more
> > > > information for the same memory usage.
> > >
> > > Can you please describe more about how checkpoint/restartpoint can be
> > > confirmed using the timeline id.
> >
> > If pg_is_in_recovery() is true, then it's a restartpoint, otherwise it's a
> > restartpoint if the checkpoint's timeline is different from the current
> > timeline?
> >
> > [1] https://www.postgresql.org/message-id/1486805889.24568.96.camel%40credativ.de

Attachment Content-Type Size
v2-0001-pg_stat_progress_checkpoint-view.patch application/octet-stream 35.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2022-02-23 13:34:59 Some optimisations for numeric division
Previous Message Amit Kapila 2022-02-23 11:10:06 Re: logical decoding and replication of sequences