Re: WAL Archive command.

From: Scott Mead <scottm(at)openscg(dot)com>
To: Jerry Sievers <gsievers19(at)comcast(dot)net>
Cc: John Britto <john(at)sodexis(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: WAL Archive command.
Date: 2017-09-27 19:35:23
Message-ID: CAKq0gvKbrqfTLgF9mcELybhj_Jtf15Qb3PqomyBBS=pY4wn3Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Sep 27, 2017 at 2:55 PM, Jerry Sievers <gsievers19(at)comcast(dot)net>
wrote:

> John Britto <john(at)sodexis(dot)com> writes:
>
> > Hello,
> >
> > I have a streaming replication setup along with WAL archive.
> >
> > archive_command = ‘test ! -f /var/pg_archive/%f && cp %p <archive
> > location>%f && scp %p postgres(at)192(dot)168(dot)0(dot)123:<archive location>/%f'
> >
> > When the SCP command fails, the master repeatedly tries to send the
> > archived WAL to standby. But during this time, the pg_xlog directly
> > grew with newer WAL files.
> >
> > The streaming replication hasn't had the problem because on my check,
> > the WAL write location on the primary was same with the last WAL
> > location received/replayed in standby.
> >
> > Since the pg_xlog in the master had few newer WAL files, the master
> > archive is lagging to pick the current pg_xlog WAL file. When a new
> > WAL occur in the pg_xlog, Master picks the old WAL file to send to
> > the standby.
>
> Yup Pg is going to handle the unshipped WALs one at a time and it will
> do them in order, oldest (lowest file name) first.
>
> > How should I force the PostgreSQL to batch copy the lagging WAL files
> > to pg_archive and then send to standby. Can I do this manually using
> > rsync? I wonder how PostgreSQL knows the changes because it
> > maintains info in archive_status with extension as .ready and .done.
>
> I suggest you fix your basic archiving routine to complete and exit with
> success to postgres.
>

+1

scp %p host:/archive/%f
if [ "$?" -ne 0 ]
then
echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log
cp %p /localarchive/%f
exit 0
fi

Essentially, always make sure that you are returning a 0 to postgres. If
there is a failure, either log it or handle it separately. This code
snippet is *NOT COMPLETE, *there's a lot more to do in order to make it
production ready and recoverable. The biggest issue I've had with scp is
that you have to set and enforce a timeout and trap the timeout. Note, the
above only works until your local machine (or the /localarchive partition)
runs out of space. It's *really* important that you have ultra solid
logging and alerting around this.

> And as for archive command scripts in general, simpler is better.
>
> If you want to manually ship them in bulk, you may do so but then will
> need to remove the corresponding archive_status/$foo.ready file so that
> postgres won't keep trying to ship the same one.
>

I'm a huge fan of this strategy, especially if you're sending to a remote
datacenter.

archive_command.sh:

cp %p /localarchive/%f
if [ "$?" -ne 0 ]
then
echo "`date`:FAILED:%p" >> /var/log/failed_xlog.log
exit 0
fi

send_archive_to_remote.sh
rsync -avzP /localarchive/* host:/archive/

Of course, now you have to deal with files that are removed from the slave
and making sure they get removed from the master appropriately, but, this
is fairly straightforward.

--Scott

> HTH
>
> > Please assist.
> >
> > Thanks,
> >
> > John Britto
> >
> >
> >
>
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres(dot)consulting(at)comcast(dot)net
> p: 312.241.7800
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
--
Scott Mead
Sr. Architect
*OpenSCG <http://openscg.com>*
http://openscg.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Mead 2017-09-27 19:48:05 Re: Rsync to a recovering streaming replica?
Previous Message Jerry Sievers 2017-09-27 18:55:41 Re: WAL Archive command.