Hi, everyone. I'm working on a project that is already using
PostgreSQL 9.0, including streaming replication. I'm trying to help
them figure out a good architecture for ensuring stability and
failover under a variety of conditions, and wanted to ask the
community for suggestions and help.
Basically, they have a mission-critical application that talks to
PostgreSQL, and which works quite well. Because of the
mission-critical nature of the application, it has been implemented
twice, once at each data center. The two data centers are connected
via a network connection; one PostgreSQL server acts as the master,
and the other acts as a (read-only) slave. We're using pgpool in
the second data center (i.e., the one with the PostgreSQL
replication slave) to send all writes to the first data center
(i.e., the one with the PostgreSQL replication master), but to
balance reads across the two servers.
This all works really well. The automatic failover also works well,
such that when the master goes down, the slave is promoted to the
master, a bit of IP-address switching happens behind the scenes, and
things continue to hum along.
So far, so good. But we have a few questions:
- Once the slave has been promoted to master, we have a single
server, and a single point of failure. Is there any simple way
to get the former master to become a slave? I assume that it
would need to start the whole becoming-a-slave process from
scratch, invoking pg_start_backup(), copying files with rsync,
and then pg_stop_backup(), followed by connecting to the new
master. But perhaps there's a shorter, easier way for a "fallen
master" to become a slave?
- Is there any easy, straightforward way for a "fallen master"
to re-take its position, demoting the promoted slave back to its
original position of slave? (With little or no downtime, of
course.) I assume not, but I just wanted to check; my guess is
that you have to just make it a slave, and then start to follow
the newly promoted master.
- If the network connection between the two data centers goes
down, but if the computers are still up, we worry that we'll end
up with two masters -- the original master, as well as the
slave, which will (falsely) believe the master to be down, and
will thus promote itself to master. Given that PostgreSQL
doesn't allow master-master synchronization, we're thinking of
using a heartbeat to check if the other computer is available,
in both directions -- and that if the master cannot detect the
slave, then it goes into a read-only mode of some sort. Then,
when it detects the slave again, and can restart streaming, it
goes back into read-write mode. Is there a way (other than
Bucardo, which doesn't seem to fit the bill for this project),
is there any way for us to merge whatever diffs might be on the
two servers, and then reconnect them in master-slave streaming
mode when communication is re-established?
- Of course, Is there any easy way to do that? If so, then what
happens when pgpool tries forward an INSERT to the master while
it's in read-only mode? (For the record, I'm pretty sure that
there isn't any easy or obvious way to make a database
read-only, and that we can simulate read-only mode by adding
INSERT/UPDATE triggers on each of the four -- yes, only four --
tables in the database, silently ignoring data that's posted. I
floated this with the project managers, and they were OK with
this idea -- but I wanted to double-check whether this is a
viable solution, or if there's an obvious pitfall I'm missing
and/or a better way to go about this.
- If we use master-slave replication, and communication is cut
off, does the slave reconnect automatically? I believe that the
answer is "yes," and that the replication will continue so long
as we're in the defined window for replication delays.
Thanks for any suggestions and answers that you can provide. And of
course, if I've missed something obvious in the documentation, then
a pointer to the appropriate resource would be more than welcome.n
Reuven
--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner