From: | Andy <angelflow(at)yahoo(dot)com> |
---|---|
To: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Tutorials on high availability Postgresql setup? |
Date: | 2010-10-07 07:59:46 |
Message-ID: | 794798.90719.qm@web111305.mail.gq1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ah thanks for the explanation. I was hoping for an automated setup without the need to get paged 24/7.
So HA is still as hard as I thought it would be. I was hoping that with 9.0 things would be easier.
--- On Thu, 10/7/10, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
> Subject: Re: [GENERAL] Tutorials on high availability Postgresql setup?
> To: "Andy" <angelflow(at)yahoo(dot)com>
> Cc: pgsql-general(at)postgresql(dot)org
> Date: Thursday, October 7, 2010, 3:24 AM
> On Thu, Oct 7, 2010 at 12:27 AM, Andy
> <angelflow(at)yahoo(dot)com>
> wrote:
> > Is there any tutorials or detailed instructions on how
> to set up HA postgresql & failover? The documentation
> (http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html)
> on this topics is pretty scarce.
> >
> > The scenario I'm most interested in is this:
> >
> > 2 servers - a master and a hot standby. All writes are
> sent to master, reads are split between master and hot
> standby.
>
> To have true redundancy, you need 3 servers. Just
> saying. Otherwise
> when one goes down, no more redundancy.
>
> > 1) If the hot standby goes down, how do I redirect
> reads to the master?
>
> Have a config file for your app that tells it where to go
> for reads
> and writes. Change the config file to point reads at
> a different db
> if a read slave fails. What constitutes a failed read
> slave is kind
> of a business decision, so you'll likely have to write your
> own code
> to decide what being down means.
>
> > 2) If the master fails
> > -how do I automatically promote the standby to
> master and send all reads/writes to the new master?
>
> First you need to decide if you actually want automated
> failovers.
> I've seen automated failovers cause as many problems as
> they were
> supposed to fix, but it can be done. Keep in mind
> that on a two db
> system, failing over means you lose redundancy. If
> your cluster fails
> over on a lot of false positives, that's a lot of time with
> no
> redundancy. If your script isn't written with having
> only one node in
> mind, it might try to failover a second time with no read
> slave to
> promote to master.
>
> Also, you're going to have to come up with what constitutes
> a failed
> master. 30 seconds non-responsive? 5
> minutes? An hour? If the
> problem is that the write master is simply overloaded, then
> failing
> over isn't gonna solve anything, as the now newly promoted
> master is
> going to collapse as well under even heavier load. It
> might have been
> better to adjust the load factors used to determine where
> read queries
> go to take load off of the master, or to change a setting
> in your app
> that reduces load on the master. With an overloaded
> write master,
> then failover, then overloaded even worse new write master
> you've got
> a site down, no redundancy, and you need to rebuild your
> old master as
> a read slave to handle the load.
>
> To start with I do not recommend doing automatic
> failovers. Have a
> system in place where your DBA / SA can promote a slave to
> master in
> one or two easy steps, and if / when the master truly
> fails, then run
> that script. A human can make that decision with far
> more care than a
> piece of code.
>
> > -what happens when the old master comes back up?
> Do I need to so anything to make it catches up to the new
> master?
>
> You can't let the old master come back up as thinking it's
> the master
> as well. You have to re-establish replication to it
> as a slave.
> Again, this is usually not automated, at least not at
> first. The old
> master needs to be "shot in the head" so to speak before it
> comes back
> up, or your app may start writing to it instead of or as
> well as the
> new master, and now you've got split-brain problems.
>
> In short automated failover is complicated to get right,
> and if you
> get it wrong the cost of the consequences can far worse
> than the 5 or
> 10 minutes of downtime required for a manual
> switch-over. First write
> scripts that automate most of the task for your application
> and db
> farm. Test those scripts as much as you can on a test
> farm. Then run
> them when needed by hand when things go wrong. If or
> when you're
> certain you've got all the bugs worked out and all the
> possible
> failure scenarios worked out, you can start testing
> automated
> failover.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2010-10-07 08:36:37 | PGDay.EU 2010 Call for Papers ending soon! |
Previous Message | John R Pierce | 2010-10-07 07:58:44 | Re: Tutorials on high availability Postgresql setup? |