Re: Reliable WAL file shipping over unreliable network

From: Nagy László Zsolt <gandalf(at)shopzeus(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Reliable WAL file shipping over unreliable network
Date: 2018-02-28 17:53:53
Message-ID: 954e01b6-b240-eb9a-5feb-8efd202fef5a@shopzeus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin


>
> Just use "-ac”;  you want -c option to ensure no data corruption
> during the transfer.  Do not delete the file; let Postgres manage that.
>
> Here is a snippet from I script I use for archiving.  You also want to
> make your script returns failure or success correctly.  
>
> # SSH Command and options
> SSH_CMD="ssh -o ServerAliveInterval=20 $ARCH_SERVER"
> STS=3
>
> OUTPUT=$(rsync -ac --rsync-path="mkdir -p $ARCH_DIR && rsync"
> $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
> if [ $? == 0 ]; then 
>    STS=0
> fi
>
> exit $STS
>
Thanks for the script. So I need to use this on the master side in
archive_command. It ensures that postgres will retry to transfer
partially transferred files until it succeeds. Unfortunately, I cannot
use this in my docker container which is isolated and does not contain
SSH or rsync. But I get the idea and I can come up with a simple script
that runs on the host machine and transfers these files reliably to the
slave side.

But I still don't understand what happens on the slave side when the
slave tries to use a partially transferred WAL file. I have this
recovery.conf on the slave side:

standby_mode='on'
primary_conninfo='host=postgres-master port=5432 user=not_telling
password=not_telling'
trigger_file='/backup/trigger'
restore_command = 'cp /transfer/%f %p'
archive_cleanup_command = 'pg_archivecleanup /transfer %r'

So what happens when the slave postgres executes restore_command on a
WAL file that was transferred partially? It cannot test the file for
completeness before it was already copied to pg_wal. The restore command
itself also cannot tell if the file is complete or not. What does
PostgreSQL do when it sees an incomplete file in pg_wal? Detects that
the file was incomplete, and tries to execute the restore_command again?
When? How often?

It seems inefficient to execute the restore_command multiple times, just
to find out that the file was not yet complete, but if this is how it
works, then it is fine with me. But does it really work that way? I
don't see it documented (or maybe I'm looking at the wrong place.)

Sorry for the many questions. I need to understand every detail of this
before we do it in production.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message scott ribe 2018-02-28 18:05:19 Re: Reliable WAL file shipping over unreliable network
Previous Message Ertan Küçükoğlu 2018-02-28 17:35:13 Separate log for a specfic database