Re: suppress empty archive_command warning message

From: Pavel Tide <paveltide(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Yogesh Jadhav <pgyogesh(at)outlook(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-novice(at)lists(dot)postgresql(dot)org
Subject: Re: suppress empty archive_command warning message
Date: 2021-04-29 17:34:40
Message-ID: CAAnkphXPeo9hH=QE6cWnfCsvZCh=6BmMOOH4v2pUcWkfLTs5ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Tue, Apr 27, 2021 at 4:59 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> Greetings,
>
> * Pavel Tide (paveltide(at)gmail(dot)com) wrote:
> > > You absolutely *must* replay all of the WAL that existed at the time
> > > that the snapshot was taken and only after all of that WAL has been
> > > replayed can you stop WAL replay at some later point. There's
> > > additional complexities if you have to deal with multiple storage
> > > devices and tablespaces since, typically, snapshots are not guaranteed
> > > across those and therefore you really need to actually do a
> > > pg_start_backup and a pg_stop_backup (and save the backup label file..).
> >
> > We do trigger a pg_start_backup right before taking a snapshot
> > (simultaneous across all devices), and once the snapshot has been
> > taken we trigger pg_stop_backup.
>
> Ok, good, and you collect the backup_label that's returned by
> pg_stop_backup and make sure to store it with that snapshot, and ensure
> that when the snapshot is used you put the backup_label into place?

Not that I've implemented that part by myself, but I will double-check
with my team, thank you for the advice.

> This doesn't make sense though- how would continuous shipping result in
> a higher load? The amount of WAL doesn't change as it's based on the
> amount of data written to the database and so delaying it just means
> you're going to have spikes of activity and then periods of downtime.
> Generally speaking, it's better to have a continuous lower level of
> activity rather than such spikes (which is why we actually just bumped
> the default for checkpoint completion target to 0.9, as an example...).
> It's also better to get the WAL off the system as quickly as possible,
> to minimize the risk of commits being lost.

Well, writing WAL-records in bunches instead of writing them one by
one puts less stress onto our database.
Also it might be tricky to make it so that the utility that we use
will be able to tell the server that it's ready to ship something so
that the server will come over and fetch the logs.
Therefore, the server should come and ask once in a while (depending
on how often a new segment is created, which is practically an
unpredictable thing because it depends on many factors), and that
would require the server to generate requests quite frequently.

> The way we'd probably want to actually implement this would be some kind
> of "pause/resume" system for archiving of WAL rather than just
> suppressing messages about pretty clear misconfigurations, or suppress
> the message about retrying failed attempts to archive a WAL segment, and
> that'd probably be a fair bit of code (and a potential foot-gun for
> users..) and I'm having a pretty hard time seeing the justification for
> it.

I think that pause/resume would work too.
While I agree with a foot-gun argument in general, I think that users
who subconsciously want to shoot themselves in a foot they, uh, find a
way no matter what : )

I also understand your concerns regarding the proposed design, so
thank you for your insights,

Cheers

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Pól Ua Laoínecháin 2021-05-13 15:07:12 Transaction ISOLATION LEVEL - have I missed something?
Previous Message David G. Johnston 2021-04-29 15:32:32 Re: Documentation search question