Re: suppress empty archive_command warning message

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Pavel Tide <paveltide(at)gmail(dot)com>
Cc: Yogesh Jadhav <pgyogesh(at)outlook(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-novice(at)lists(dot)postgresql(dot)org
Subject: Re: suppress empty archive_command warning message
Date: 2021-04-26 20:51:03
Message-ID: 20210426205103.GE20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Greetings,

* Pavel Tide (paveltide(at)gmail(dot)com) wrote:
> On Mon, Apr 26, 2021 at 7:35 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Pavel Tide (paveltide(at)gmail(dot)com) wrote:
> > > We have many postgres servers so we would prefer our central server to
> > > collect logs in packs, all at once, instead of the servers streaming
> > > them continuously.
> >
> > That doesn't really provide any detail as to *why* that's sensible to
> > do. The longer you wait for the WAL to actually be pushed, the more
> > data you're potentially going to lose if something bad happens.
>
> Well, our backup strategy involves having several image-based backups
> and a set of corresponding WAL segments connecting those images (i1,
> i2, i3). Something like this:
>
> i1 - wal1 - wal2 - wal3 - wal4 - i2 - wal5 - wal6 - wal7 - wal8 - i3 - wal9 ...
>
> And so on.
>
> That way, if we wanted to restore to image2-wal7 point-in-time it
> would take us less time to spin up an instance using i2 + applying
> WALs 5,6,7 rather than having to restore image1 and applying WALs from
> 1 to 7, wouldn't it?

Uh, it's not that simple though.

You absolutely *must* replay all of the WAL that existed at the time
that the snapshot was taken and only after all of that WAL has been
replayed can you stop WAL replay at some later point. There's
additional complexities if you have to deal with multiple storage
devices and tablespaces since, typically, snapshots are not guaranteed
across those and therefore you really need to actually do a
pg_start_backup and a pg_stop_backup (and save the backup label file..).

Note also that it's not obvious where exactly that ending point in the
WAL is wrt when a snapshot is taken because PG recycles WAL files and
you can't tell just by looking at the files on the filesystem which are
necessary to get back to consistency and which are from after that
point. PG also won't complain if you stop early, it'll just think that
it reached the end of WAL and will happily come up even though the
database will be corrupt.

In general, I discourage people from trying to build their own backup
system because it's simply not that easy to do correctly and you end up
with bad backups or corrupted databases when you restore...

> > > What if we put "sleep X" insude our archive command so that it will
> > > wait until we are ready to process the bunch of logs? Is there any
> > > limit for how long PG will wait for the command to return something?
> >
> > I hate to ask just what the archive command is that you're currently
> > using..
>
> It's not a command, in fact. We use an in-house-built utility that
> basically places itself as an archive command once in every X minutes
> and starts shipping logs to the central server.

What does that mean "places itself as an archive command"? You
absolutely can not just start copying WAL files out of the pg_wal
directory independently because PG recycles WAL files and the writes
into them and you don't really "know" when a WAL file has been finished
without taking other steps or arranging to have WAL files archived
through calls to archive_command...

> > Just adding a 'sleep' is certainly not what I'd recommend as
> > you could end up with a pile up of WAL that's slowly trickling out after
> > a large amount of writes. Seems like what you'd really want is to
> > archive somewhere else locally and then have a cronjob or similar
> > periodically run and push whatever is in the WAL across to the central
> > server.
>
> That could be an option, but we would like to avoid any "intermediate
> landing" of WAL segments.
> Copying those segments onto the same machine imposes doubled IO,
> whereas saving them somewhere else would require us to have some
> additional storage.

None of this explains why you want to wait to ship WAL to the central
server...

Thanks,

Stephen

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Pavel Tide 2021-04-26 22:30:34 Re: suppress empty archive_command warning message
Previous Message Pavel Tide 2021-04-26 20:34:20 Re: suppress empty archive_command warning message