From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Timeout and Synch Rep |
Date: | 2010-10-08 13:30:58 |
Message-ID: | AANLkTikPjD6ji461ckcgfG859KuzNMeQ9TAauwPAeat3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 8, 2010 at 4:50 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> In my effort to make the discussion around the design decisions of synch
> rep less opaque, I'm starting a separate thread about what has developed
> to be one of the more contentious issues.
>
> I'm going to champion timeouts because I plan to use them. In fact, I
> plan to deploy synch rep with a timeout if it's available within 2 weeks
> of 9.1 being released. Without a timeout (i.e. "wait forever" is the
> only mode), that project will probably never use synch rep.
>
> Let me give you my use-case so that you can understand why I want a timeout.
>
> Client is a telecommunications service provider. They have a primary
> server and a failover server for data updates. They also have two async
> slaves on older machines for reporting purposes. The failover
> currently does NOT accept any queries in order to keep it as current as
> possible.
>
> They would like the failover to be synchronous so that they can
> guarentee no data loss in the event of a master failure. However, zero
> data loss is less important to them than uptime ... they have a five9's
> SLA with their clients, and the hardware on the master is very good.
>
> So, if something happens to the standby, and it cannot return an ack in
> 30 seconds, they would like it to degrade to asynch mode. At that
> point, they would also like to trigger a nagios alert which will wake up
> the sysadmin with flashing red lights. Once he has resolved the
> problem, he would like to promote the now-asynch standby back to synch
> standby.
>
> Yes, this means that, in the event of a standby failure, they have a
> window where any failure on the master will mean data loss. The user
> regards this risk as acceptable, given that both the master and the
> failover are located in the same data center in any case, so there is
> always a risk of a sufficient disaster wiping out all data back to the
> daily backup.
This explains very well why some systems require the timeout.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Thom Brown | 2010-10-08 13:39:49 | Re: Timeout and Synch Rep |
Previous Message | Andrew Dunstan | 2010-10-08 13:18:27 | Re: Git cvsserver serious issue |