Re: suppress empty archive_command warning message

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Pavel Tide <paveltide(at)gmail(dot)com>
Cc: Yogesh Jadhav <pgyogesh(at)outlook(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-novice(at)lists(dot)postgresql(dot)org
Subject: Re: suppress empty archive_command warning message
Date: 2021-04-27 14:59:16
Message-ID: 20210427145916.GG20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Greetings,

* Pavel Tide (paveltide(at)gmail(dot)com) wrote:
> > You absolutely *must* replay all of the WAL that existed at the time
> > that the snapshot was taken and only after all of that WAL has been
> > replayed can you stop WAL replay at some later point. There's
> > additional complexities if you have to deal with multiple storage
> > devices and tablespaces since, typically, snapshots are not guaranteed
> > across those and therefore you really need to actually do a
> > pg_start_backup and a pg_stop_backup (and save the backup label file..).
>
> We do trigger a pg_start_backup right before taking a snapshot
> (simultaneous across all devices), and once the snapshot has been
> taken we trigger pg_stop_backup.

Ok, good, and you collect the backup_label that's returned by
pg_stop_backup and make sure to store it with that snapshot, and ensure
that when the snapshot is used you put the backup_label into place?

> > What does that mean "places itself as an archive command"? You
> > absolutely can not just start copying WAL files out of the pg_wal
> > directory independently because PG recycles WAL files and the writes
> > into them and you don't really "know" when a WAL file has been finished
> > without taking other steps or arranging to have WAL files archived
> > through calls to archive_command...
>
> I mean that we use it as an archive_command.
> Instead of placing some sort of "cp %p /mnt/nfs/%f" in
> archive_command, we use '/bin/paveltide_utility %p'.

Ok, that certainly wasn't clear from what you had written before, but if
you're at least archiving it through the archive_command then it should
be alright.

> > None of this explains why you want to wait to ship WAL to the central
> > server...
>
> The server does not just accept the segments, but also uses its own
> database to keep a note of the segments saved on the storage, LSNs,
> and whatnot.
> Continuous shipping means a higher load on the server, which we would
> like to avoid.

This doesn't make sense though- how would continuous shipping result in
a higher load? The amount of WAL doesn't change as it's based on the
amount of data written to the database and so delaying it just means
you're going to have spikes of activity and then periods of downtime.
Generally speaking, it's better to have a continuous lower level of
activity rather than such spikes (which is why we actually just bumped
the default for checkpoint completion target to 0.9, as an example...).
It's also better to get the WAL off the system as quickly as possible,
to minimize the risk of commits being lost.

The way we'd probably want to actually implement this would be some kind
of "pause/resume" system for archiving of WAL rather than just
suppressing messages about pretty clear misconfigurations, or suppress
the message about retrying failed attempts to archive a WAL segment, and
that'd probably be a fair bit of code (and a potential foot-gun for
users..) and I'm having a pretty hard time seeing the justification for
it.

Thanks,

Stephen

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Heckler, Kim M 2021-04-29 15:21:06 Documentation search question
Previous Message Pavel Tide 2021-04-26 22:30:34 Re: suppress empty archive_command warning message