Re: Fully-automatic streaming replication failover when master dies?

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Dmitry Koterov <dmitry(dot)koterov(at)gmail(dot)com>
Cc: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>, Susan Cassidy <susan(dot)cassidy(at)decisionsciencescorp(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fully-automatic streaming replication failover when master dies?
Date: 2014-01-26 04:50:07
Message-ID: CAOR=d=0hGkbNPAwbde-h_8hRAW9HT-h_Mg-ih=+xmX0rG9AD9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Please don't top post in technical discussions.

On Sat, Jan 25, 2014 at 11:29 AM, Dmitry Koterov
<dmitry(dot)koterov(at)gmail(dot)com> wrote:
>
> On Friday, January 24, 2014, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>>
>> On Thu, Jan 23, 2014 at 7:16 PM, Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
>> wrote:
>> >
>> >
>> > On Fri, Jan 24, 2014 at 1:38 AM, Susan Cassidy
>> > <susan(dot)cassidy(at)decisionsciencescorp(dot)com> wrote:
>> >>
>> >> pgpool-II may do what you want. Lots of people use it.
>> >
>> >
>> > I don't think pgpool adds the lost node on its own (once the node is
>> > live or available again). Plus if you have a 3 node replication you need to
>> > have your own failover_command (as a shell script) which changes the master
>> > node for 2nd secondary when one of the secondary servers decides to be
>> > promoted to primary). I hope things will get easy with version 9.4 (I guess
>> > in 9.4 one won't have to rebuild a master node from backup. if the wal files
>> > are available it will just roll forward).
>> >
>> >> > for all the machines). At least MongoDB does the work well, and with
>> >> > almost
>> >> > zero configuration.
>> >> Mongo's data guarantees are, um, somewhat less robust than
>> >> PostgreSQL's.
>> >
>> >
>> > I don't think this has anything to do with data reliability or ACID
>> > property (if that is what you are referring to).
>> >
>> >> Failover is easy if you don't have to be exactly right.
>> >
>> >
>> > IMHO That's not a fair point. PostgreSQL supports sync replication (as
>> > well as async) and does that complicate the failover process or an async
>> > replication? I guess what he is asking for is automation of whatever feature
>> > PostgreSQL already supports.
>>
>> No it's a fair point. When you go from "we promise to try and not lose
>> your data" to "we promise to not lose any of your data" the situation
>> is much different.
>>
>> There are many things to consider in the postgresql situation. Is it
>> more important to keep your application up and running, even if only
>> in read only mode? Is performance more important than data integrity?
>> How many nodes do you have? How man can auto-fail over before you
>> auto-fail over to the very last one? How do you rejoin failed nodes,
>> one at a time, all at once, by hand, automagically? And so on. There
>> are a LOT of questions to ask that mongo already decided for you, and
>> the decision was that if you lose some data that's OK as long as the
>> cluster stays up. With PostgreSQL the decision making process probably
>> has a big impact on how you answer these types of questions and how
>> you fail over.
>>
>> Add to that that most postgresql database servers are VERY robust,
>> with multi-lane RAID array controllers and / or sturdy SANs underneath
>> them, and their failure rates are very low, you run the risk of your
>> auto-failover causing much of an outage as the server failing, since
>> most failovers are going to cause some short interruption in service.
>> It's not a simple push a button take a banana, one size fits all
>> problem and solution.

> Failover is NOT about the RAID or SAN robusness mostly. It's about
> datacenters connectivity and network issues. If you lose one datacenter (it
> happens, and there is no aid for it), you should redirect all traffic to
> another DC ASAP and failover the master DB to it. When the disconnected DC
> is up again, it should recover from this situation.
>
> So +1 for the previous man, PostgreSQL ACID and MongoDB non-ACID have
> absolute no relevance to the failover problem.

If you'll bother reading what I wrote AGAIN, you'll notice my mention
on ACID etc was more of an afterthought here. There are real questions
about data loss and recovery that matter when you are failing over.
Are you running your cluster in synchronous mode across geographically
diverse data centers? If not how long do you wait for the master to
come back before you fail over? A millisecond? A second? A minute? The
answer will likely be different for me than for you.

While ACID isn't the main or only reason for things being different,
it IS a valid reason because different people use PostgreSQL for
different things. If I'm running it as a session server, I treat it
one way, as a key-value store another, as a transactional database
handling monetary funds yet another. You're refusal to accept that
this is a complex issue with complex answers isn't helping you find
the right answer to your problem.

--
To understand recursion, one must first understand recursion.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dmitry Koterov 2014-01-26 08:35:23 Re: Fully-automatic streaming replication failover when master dies?
Previous Message john.tiger 2014-01-26 04:44:17 any examples - sync offline json to server