Quick Links

Re: Failover architecture

From:	John R Pierce <pierce(at)hogranch(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Failover architecture
Date:	2011-08-17 16:01:00
Message-ID:	4E4BE5BC.2070106@hogranch.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 08/17/11 6:25 AM, Reuven M. Lerner wrote:
>
> * Once the slave has been promoted to master, we have a single
> server, and a single point of failure. Is there any simple way to
> get the former master to become a slave? I assume that it would
> need to start the whole becoming-a-slave process from scratch,
> invoking pg_start_backup(), copying files with rsync, and then
> pg_stop_backup(), followed by connecting to the new master. But
> perhaps there's a shorter, easier way for a "fallen master" to
> become a slave?
>

nope, thats pretty much what you have to do. if you use rsync, and
the files haven't changed too much, the replication should be relatively
fast.

> * Is there any easy, straightforward way for a "fallen master" to
> re-take its position, demoting the promoted slave back to its
> original position of slave? (With little or no downtime, of
> course.) I assume not, but I just wanted to check; my guess is
> that you have to just make it a slave, and then start to follow
> the newly promoted master.
>

what you said.

> * If the network connection between the two data centers goes down,
> but if the computers are still up, we worry that we'll end up with
> two masters -- the original master, as well as the slave, which
> will (falsely) believe the master to be down, and will thus
> promote itself to master. Given that PostgreSQL doesn't allow
> master-master synchronization, we're thinking of using a heartbeat
> to check if the other computer is available, in both directions --
> and that if the master cannot detect the slave, then it goes into
> a read-only mode of some sort. Then, when it detects the slave
> again, and can restart streaming, it goes back into read-write
> mode. Is there a way (other than Bucardo, which doesn't seem to
> fit the bill for this project), is there any way for us to merge
> whatever diffs might be on the two servers, and then reconnect
> them in master-slave streaming mode when communication is
> re-established?
>

problematic in any sort of cluster system, you end up with two versions
of 'the truth' and you have to figure out how to reconcile them.
absolutely won't work at all with streaming replication, which requires
the two servers to be block by block the same. If you have to deal
with this sort of thing, you may want to do your OWN replication at an
application level, perhaps using some sort of messaging environment,
where you can queue up the pending "change requests"

> * Of course, Is there any easy way to do that? If so, then what
> happens when pgpool tries forward an INSERT to the master while
> it's in read-only mode? (For the record, I'm pretty sure that
> there isn't any easy or obvious way to make a database read-only,
> and that we can simulate read-only mode by adding INSERT/UPDATE
> triggers on each of the four -- yes, only four -- tables in the
> database, silently ignoring data that's posted. I floated this
> with the project managers, and they were OK with this idea -- but
> I wanted to double-check whether this is a viable solution, or if
> there's an obvious pitfall I'm missing and/or a better way to go
> about this.
>

that sounds messy.

> * If we use master-slave replication, and communication is cut off,
> does the slave reconnect automatically? I believe that the answer
> is "yes," and that the replication will continue so long as we're
> in the defined window for replication delays.
>

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

In response to

Failover architecture at 2011-08-17 13:25:29 from Reuven M. Lerner

Browse pgsql-general by date

	From	Date	Subject
Next Message	Craig Ringer	2011-08-17 16:17:09	Re: Failover architecture
Previous Message	Craig Ringer	2011-08-17 15:55:50	Re: Connection Error during Pg_restore