Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-02-28 13:10:25
Message-ID: CAEze2Wh_+P9Hs4RQ=tc=70rdB5h0omGx5Kz-BLhsJPRnKQ+cWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> 3) Why do we need this extra calculation for start_lsn?
> Do you ever see a negative LSN or something here?
> + ('0/0'::pg_lsn + (
> + CASE
> + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
> + ELSE (0)::numeric
> + END + (s.param3)::numeric)) AS start_lsn,

Yes: LSN can take up all of an uint64; whereas the pgstat column is a
bigint type; thus the signed int64. This cast is OK as it wraps
around, but that means we have to take care to correctly display the
LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using
the special-casing for negative values.

As to whether it is reasonable: Generating 16GB of wal every second
(2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth
has been > 20GB/sec for a while); and that leaves you 2^29 seconds of
database runtime; or about 17 years. Seeing that a cluster can be
`pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at
least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)),
we can assume that clusters hitting LSN=2^63 will be a reasonable
possibility within the next few years. As the lifespan of a PG release
is about 5 years, it doesn't seem impossible that there will be actual
clusters that are going to hit this naturally in the lifespan of PG15.

It is also possible that someone fat-fingers pg_resetwal; and creates
a cluster with LSN >= 2^63; resulting in negative values in the
s.param3 field. Not likely, but we can force such situations; and as
such we should handle that gracefully.

> 4) Can't you use timestamptz_in(to_char(s.param4)) instead of
> pg_stat_get_progress_checkpoint_start_time? I don't quite understand
> the reasoning for having this function and it's named as *checkpoint*
> when it doesn't do anything specific to the checkpoint at all?

I hadn't thought of using the types' inout functions, but it looks
like timestamp IO functions use a formatted timestring, which won't
work with the epoch-based timestamp stored in the view.

> Having 3 unnecessary functions that aren't useful to the users at all
> in proc.dat will simply eatup the function oids IMO. Hence, I suggest
> let's try to do without extra functions.

I agree that (1) could be simplified, or at least fully expressed in
SQL without exposing too many internals. If we're fine with exposing
internals like flags and type layouts, then (2), and arguably (4), can
be expressed in SQL as well.

-Matthias

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-02-28 13:15:51 Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Previous Message Ashutosh Bapat 2022-02-28 13:03:30 Re: real/float example for testlibpq3