From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Cary Huang <cary(dot)huang(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: add checkpoint stats of snapshot and mapping files of pg_logical dir |
Date: | 2022-03-14 05:24:56 |
Message-ID: | CALj2ACWB8to-Y950t+tWE0FKSmvVKq2iOj=TgR7c0So2Fyi2jQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 14, 2022 at 10:45 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Sun, Mar 13, 2022 at 02:58:58PM -0700, Nathan Bossart wrote:
> > On Sun, Mar 13, 2022 at 01:54:10PM +0530, Bharath Rupireddy wrote:
> >> Another thing I added in v2 is to not emit snapshot and mapping files
> >> stats in case of restartpoint as logical decoding isn't supported on
> >> standbys, so it doesn't make sense to emit the stats there and cause
> >> server log to grow unnecessarily. Having said that, I added a note
> >> there to change it whenever logical decoding on standbys is supported.
> >
> > I think we actually do want to include this information for restartpoints
> > since some files might be left over from the snapshot that was used to
> > create the standby. Also, presumably these functions could do some work
> > during recovery on a primary.
>
> Yes, I would agree that consistency makes sense here, and it is not
> complicated to add the code to support this code path anyway. There
> is a risk that folks working on logical decoding on standbys overse
> this code path, instead.
I agree to be consistent and emit the message even in case of restartpoint.
> > Another problem I see is that this patch depends on the return value of the
> > lstat() calls that we are trying to remove in 0001 from another thread [0].
> > I think the size of the removed/sync'd files is somewhat useful for
> > understanding disk space usage, but I suspect the time spent performing
> > these tasks is more closely related to the number of files. Do you think
> > reporting the sizes is worth the extra system call?
>
> We are not talking about files that are large either, are we?
>
> Another thing I am a bit annoyed with in this patch is the fact that
> the size of the ereport() call is doubled. The LOG currently
> generated is already bloated, and this does not arrange things.
Yes, this is a concern. Also, when there were no logical replication
slots on a plain server or the server removed or cleaned up all the
snapshot/mappings files, why would anyone want to have these messages
with all 0s in the server log?
Here's what I'm thinking:
Leave the existing "checkpoint/restartpoint complete" messages intact,
add the following in LogCheckpointEnd:
if (CheckpointStats.repl_map_files_rmvd_cnt ||
CheckpointStats.repl_map_files_syncd_cnt ||
CheckpointStats.repl_snap_files_rmvd_cnt)
{
ereport(LOG,
(errmsg("snapbuild snapshot file(s) removed="
UINT64_FORMAT ", size=%zu bytes, time=%ld.%03d s, cutoff LSN=%X/%X; "
"logical rewrite mapping file(s) removed="
UINT64_FORMAT ", size=%zu bytes, synced=" UINT64_FORMAT ", size=%zu
bytes, time=%ld.%03d s, cutoff LSN=%X/%X",
CheckpointStats.repl_snap_files_rmvd_cnt,
CheckpointStats.repl_snap_files_rmvd_sz,
repl_snap_msecs / 1000, (int) (repl_snap_msecs % 1000),
LSN_FORMAT_ARGS(CheckpointStats.repl_snap_cutoff_lsn),
CheckpointStats.repl_map_files_rmvd_cnt,
CheckpointStats.repl_map_files_rmvd_sz,
CheckpointStats.repl_map_files_syncd_cnt,
CheckpointStats.repl_map_files_syncd_sz,
repl_map_msecs / 1000, (int) (repl_map_msecs % 1000),
LSN_FORMAT_ARGS(CheckpointStats.repl_map_cutoff_lsn))));
}
Thoughts?
Regards,
Bharath Rupireddy.
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2022-03-14 05:31:12 | Re: BufferAlloc: don't take two simultaneous locks |
Previous Message | Thomas Munro | 2022-03-14 05:15:59 | Re: WIP: WAL prefetch (another approach) |