From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com> |
Cc: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
Date: | 2022-02-28 04:51:23 |
Message-ID: | CALj2ACXDpgbVBt48D+2X4Hb7XvROMAg7EnbM-tHB+gPoHvEQew@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Feb 27, 2022 at 8:44 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav
> <nitinjadhavpostgres(at)gmail(dot)com> wrote:
>
> Had a quick look over the v3 patch. I'm not sure if it's the best way
> to have pg_stat_get_progress_checkpoint_type,
> pg_stat_get_progress_checkpoint_kind and
> pg_stat_get_progress_checkpoint_start_time just for printing info in
> readable format in pg_stat_progress_checkpoint. I don't think these
> functions will ever be useful for the users.
>
> 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> or checkpoint instead of having a new function
> pg_stat_get_progress_checkpoint_type?
>
> 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks
> directly instead of new function pg_stat_get_progress_checkpoint_kind?
> + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s",
> + (flags == 0) ? "unknown" : "",
> + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
> + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
> + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "",
> + (flags & CHECKPOINT_FORCE) ? "force " : "",
> + (flags & CHECKPOINT_WAIT) ? "wait " : "",
> + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "",
> + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "",
> + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : "");
>
> 3) Why do we need this extra calculation for start_lsn? Do you ever
> see a negative LSN or something here?
> + ('0/0'::pg_lsn + (
> + CASE
> + WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
> + ELSE (0)::numeric
> + END + (s.param3)::numeric)) AS start_lsn,
>
> 4) Can't you use timestamptz_in(to_char(s.param4)) instead of
> pg_stat_get_progress_checkpoint_start_time? I don't quite understand
> the reasoning for having this function and it's named as *checkpoint*
> when it doesn't do anything specific to the checkpoint at all?
>
> Having 3 unnecessary functions that aren't useful to the users at all
> in proc.dat will simply eatup the function oids IMO. Hence, I suggest
> let's try to do without extra functions.
Another thought for my review comment:
> 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> or checkpoint instead of having a new function
> pg_stat_get_progress_checkpoint_type?
I don't think using pg_is_in_recovery work here as it is taken after
the checkpoint has started. So, I think the right way here is to send
1 in CreateCheckPoint and 2 in CreateRestartPoint and use
CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint".
Continuing my review:
5) Do we need a special phase for this checkpoint operation? I'm not
sure in which cases it will take a long time, but it looks like
there's a wait loop here.
vxids = GetVirtualXIDsDelayingChkpt(&nvxids);
if (nvxids > 0)
{
do
{
pg_usleep(10000L); /* wait for 10 msec */
} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids));
}
Also, how about special phases for SyncPostCheckpoint(),
SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
it might be increase in future (?)), TruncateSUBTRANS()?
6) SLRU (Simple LRU) isn't a phase here, you can just say
PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES.
+
+ pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
+ PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES);
CheckPointPredicate();
And :s/checkpointing SLRU pages/checkpointing predicate lock pages
+ WHEN 9 THEN 'checkpointing SLRU pages'
7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS
And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN
'processing file sync requests'
8) :s/Finalizing/finalizing
+ WHEN 14 THEN 'Finalizing'
9) :s/checkpointing snapshots/checkpointing logical replication snapshot files
+ WHEN 3 THEN 'checkpointing snapshots'
:s/checkpointing logical rewrite mappings/checkpointing logical
replication rewrite mapping files
+ WHEN 4 THEN 'checkpointing logical rewrite mappings'
10) I'm not sure if it's discussed, how about adding the number of
snapshot/mapping files so far the checkpoint has processed in file
processing while loops of
CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
be many logical snapshot or mapping files and users may be interested
in knowing the so-far-processed-file-count.
11) I think it's discussed, are we going to add the pid of the
checkpoint requestor?
Regards,
Bharath Rupireddy.
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2022-02-28 05:04:25 | Re: Synchronizing slots from primary to standby |
Previous Message | Amit Kapila | 2022-02-28 04:31:58 | Re: Design of pg_stat_subscription_workers vs pgstats |