Hi Michael, thanks for your reply.

I discussed this my colleague, and we decided to change the archive_command to execute a shell script.

#!/bin/bash
# archive_command script to replicate archivelogs to standby server slaves
#
# postgresql.conf parameter
#
# archive_command = '<$PGDATA>/replica_achive_set.sh "%p" "%f"'
#
set -e
set -u
ARCHIVE1="/mnt/server/slave1_archivedir"
ARCHIVE2="/mnt/server/slave2_archivedir"
if [ -f ${ARCHIVE1}/$2 ] && [ -f ${ARCHIVE2}/$2 ] ; then
echo Archive file $2 already exists in one of the replicated sets archive, skipping >&2
exit 0
fi
echoerr() { echo "$@" 1>&2; }
FAIL=0
`/usr/bin/rsync -aq $1 ${ARCHIVE1}/$2` & pid_1=$! ; `/usr/bin/rsync -aq $1 ${ARCHIVE2}/$2` & pid_2=$!
echoerr "Spawned replication processes $pid_1 AND $pid_2"
wait $pid_1 || let "FAIL+=1"
wait $pid_2 || let "FAIL+=1"
if [ "$FAIL" == "0" ];
then
echoerr "Replication success $1 $2"
else
echoerr "Replication failed $1 $2"
fi

This will copy the archivelogs from the master to both slaves. Will that avoid the issue with removing needed WAL files?

I should be able to use these recovery.conf files

slave #1

standby_mode = 'on'
primary_conninfo = 'host=<master database ip address> port=5432 dbname=tumsdb user=replication password=<password> application_name=slave1 sslmode=require'
restore_command = 'cp /mnt/server/slave1_archivedir/%f "%p%"'
archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave1_archivedir/ %r'
trigger_file= '/opt/PostgreSQL/9.3/data/pgsql.trigger.file'

slaves #2

standby_mode = 'on'
primary_conninfo = 'host=<master database ip address> port=5432 dbname=tumsdb user=replication password=<password> application_name=slave2 sslmode=require'
restore_command = 'cp /mnt/server/slave2_archivedir/%f "%p%"'
archive_cleanup_command = 'pg_archivecleanup /mnt/server/slave2_archivedir/ %r'
trigger_file= '/opt/PostgreSQL/9.3/data/pgsql.trigger.file'

Does this look correct?

Finally, question about the backup.

I did a pg_clt reload to change the archivelog destination from /mnt/server/master_archivedir to be redistributed to slave1 and slave2. Do I need to redo this backup step?

psql -c "select pg_start_backup('initial_backup');"
rsync -cvar --inplace --exclude=*pg_xlog* /u01/fiber/postgreSQL_data/postgres@1.2.3.5:/u01/fiber/postgreSQL_data/
psql -c " select pg_stop_backup ();"

or can I just copy all of the missing archivelog files from the /mnt/server/master_archivedir to the slaves, and then restart the slaves in recovery mode?

thanks

-------- Original Message --------
Subject: Re: [BUGS] Having trouble configuring a Master with multiple
standby Servers in PostgreSQL 9.3.3
From: Michael Paquier <michael.paquier@gmail.com>
Date: Wed, April 16, 2014 6:07 pm
To: fburgess@radiantblue.com
Cc: pgsql-bugs@postgresql.org

TODO

On Thu, Apr 17, 2014 at 1:29 AM, <fburgess@radiantblue.com> wrote:
> Now the issue is with the recovery.conf file on slave1, should the
> restore_command point to the archivelogs on the master?
Yes, this is where archive_command of master copies the WAL files. You need them for recovery operations on slaves.

> Do I run the archive_cleanup_command when I recover slave1 or do I wait
> until I have finished backup/copy from the slave2
Be careful here, this command may remove WAL files that are needed by other slaves. For example, if slave1 kicks this command, you may remove files still needed by slave2 that has not yet done any recovery operation and it may need them.

> postgresql.conf - Slave1
> restore_command = 'cp /mnt/server/master_archivedir/%f "%p%"' <--- ****
> Is this correct! **** The master remains on-line and is producing archive
> logs.
No need to have that much complexity for %p:
restore_command = 'cp -i /mnt/server/master_archivedir/%f %p'

> postgresql.conf - Slave2 Server VM
> restore_command = 'cp /mnt/server/slave2_archivedir/%f "%p%"' <--- ****
> Is this correct! **** The master remains on-line and is producing archive
> logs.
Please see above, it could be more simple.
--
Michael