Re: Standalone synchronous master

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 23:17:34
Message-ID: 52D07F8E.4020501@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/10/2014 02:59 PM, Joshua D. Drake wrote:
>
> On 01/10/2014 02:47 PM, Andres Freund wrote:
>
>> Really, the commits themselves are sent to the server at exactly the
>> same speed independent of sync/async. The only thing that's delayed is
>> the *notificiation* of the client that sent the commit. Not the commit
>> itself.
>
> Which is irrelevant to the point that if the standby goes down, we are
> now out of business.
>
> Any continuous replication should not be a SPOF. The current behavior
> guarantees that a two node sync cluster is a SPOF. The proposed behavior
> removes that.

Again, if that's your goal, then use async replication.

I really don't understand the use-case here.

The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes. If knowing for certain isn't
important to you, then use async.

BTW, people are using RAID1 as an analogy to 2-node sync replication.
That's a very bad analogy, because in RAID1 you have a *single*
controller which is capable of determining if the disks are in a failed
state or not, and this is all happening on a single node where things
like network outages aren't a consideration. It's really not the same
situation at all.

Also, frankly, I absolutely can't count the number of times I've had to
rescue a customer or family member who had RAID1 but wan't monitoring
syslog, and so one of their disks had been down for months without them
knowning it. Heck, I've done this myself.

So ... the Filesystem geeks have already been through this. Filesystem
clustering started out with systems like DRBD, which includes an
auto-degrade option. However, DBRD with auto-degrade is widely
considered untrustworthy and is a significant portion of why DBRD isn't
trusted today.

From here, clustered filesystems went in two directions: RHCS added
layers of monitoring and management to make auto-degrade a safer option
than it is with DRBD (and still not the default option). Scalable
clustered filesystems added N(M) quorum commit in order to support more
than 2 nodes. Either of these courses are reasonable for us to pursue.

What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading. If
I'm wrong, then someone can point it out to me.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-01-10 23:22:35 Re: Add CREATE support to event triggers
Previous Message Hannu Krosing 2014-01-10 23:09:44 Re: Standalone synchronous master