warm standby replication safe failover

From: Andrew Puschak <apuschak(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: warm standby replication safe failover
Date: 2013-02-02 16:22:07
Message-ID: CALFZoBvzPu=gt-j2mmnLJrWvA8+xndXwk2LE2bn2ApK9CDn02A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hi!

I'm new here and have a question. I'm working on a replacement PostgreSQL
8.4 setup for a small company. They now have a Primary postgres server and
2 secondaries which are replicated using Slony-I, however the consulting
company that set it up is afraid to failover and keeping Slony running is
taking lots of resources.

The server processes IVR phone calls so it must be quick and have high
availability.

I'm setting up a replacement group of physical servers that we can migrate
the data over to piece by piece. Since it is a small company, they've
agreed that Slony has been too complicated and that in an emergency
failover 1 to 5 minutes of lost data is acceptable and quickly having
another server available is more important. I can setup 3 separate servers
at 3 separate physical locations parallel to the current production servers.

I'm testing a setup with a primary server, which has archiving turned on
and rsync's over ssh the WAL log files to a secondary server. The secondary
server is running pg_standby and waiting for a trigger. A third server will
just accept nightly dumps of the primary to keep load down on the primary
using a cronjob. The third server will be up to 1 day old with the dumps,
but ready to go in case of primary and secondary failure as well as 1 day
historical dumps in case we need any old data.

My question is how to safely failover between Primary and Secondary servers
in this warm standby setup without loosing any data? I'd like to switch
back and forth periodically to ensure we can trigger a failover in an
emergency. Using the trigger file on the Secondary, we could loose up to 5
minutes of data since the WAL files are only rotated every 5 minutes, even
though archive_timeout = 60. I saw a blog that mentioned copying the
pg_xlog files from the Primary to the Secondary in addition to pg_standby,
but I'm unsure if this will make sure we have 100% of our data. I tried
this, put data in at each step and recovered all the data, but I'm unsure
if the process ensures that no data is lost.

Test setup: Primary was shipping logs to Secondary and secondary was
running pg_standby. I stop shipping logs by setting archive_command = ''
and reloading postgres on Primary. Then I create a table with data that
doesn't make it to the Secondary. On Primary I stopped postgres. Secondary
was still waiting for the next log, never receiving the new table. I
rsync'ed just the pg_xlog files using "pg_xlog/0*" to the Secondary pg_xlog
directory (not the WAL archive directory pg_standby will recover from).
These are the latest WAL log files and
"000000010000000000000003.00000020.backup" (should I exclude backup file?).
Then I triggered the recover. The new table is there and accessible.

My guess is that the logs in the WAL archive directory have been applied to
the database and then when the database starts up it also has the last logs
in the pg_xlog directory so it applies those last ones as well and we don't
loose any data. Can someone confirm or deny this or a better solution to
any of the above?

Thank you so much for your help,
Andrew

--
Andrew Puschak
(267) 614 - 2373
apuschak(at)gmail(dot)com
http://andrewpuschak.com

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2013-02-02 17:19:18 Re: warm standby replication safe failover
Previous Message anderson earley 2013-02-01 22:10:43 hello!!!