Re: Issues with Quorum Commit

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-08 09:02:49
Message-ID: 4CAEDE39.2060803@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/08/2010 01:44 AM, Greg Smith wrote:
> They'll use Sync Rep to maximize
> the odds a system failure doesn't cause any transaction loss. They'll
> use good quality hardware on the master so it's unlikely to fail.

.."unlikely to fail"?

Ehm.. is that you speaking, Greg? ;-)

> But
> when the database finds the standby unreachable, and it's left with the
> choice between either degrading into async rep or coming to a complete
> halt, you must give people the option of choosing to degrade instead
> after a timeout. Let them set off the red flashing lights, sound the
> alarms, and pray the master doesn't go down until you can fix the
> problem.

Okay, okay, fair enough - if there had been red flashing lights. And
alarms. And bells and whistles. But that's what I'm afraid the timeout
is removing.

> I don't see this as needing any implementation any more complicated than
> the usual way such timeouts are handled. Note how long you've been
> trying to reach the standby. Default to -1 for forever. And if you hit
> the timeout, mark the standby as degraded

..and how do you make sure you are not marking your second standby as
degraded just because it's currently lagging? Effectively degrading the
utterly needed one, because your first standby has just bitten the dust?

And how do you prevent the split brain situation in case the master dies
shortly after these events, but fails to come up again immediately?

Your list of data recovery projects will get larger and the projects
more complicated. Because there's a lot more to it than just the
implementation of a timeout.

Regards

Markus Wanner

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2010-10-08 09:07:36 Re: Issues with Quorum Commit
Previous Message Simon Riggs 2010-10-08 09:00:31 Re: Issues with Quorum Commit