From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Smith <greg(at)2ndquadrant(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Issues with Quorum Commit |
Date: | 2010-10-21 00:49:06 |
Message-ID: | 201010210049.o9L0n6114296@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> > I don't see this as needing any implementation any more complicated than
> > the usual way such timeouts are handled. Note how long you've been
> > trying to reach the standby. Default to -1 for forever. And if you hit
> > the timeout, mark the standby as degraded and force them to do a proper
> > resync when they disconnect. Once that's done, then they can re-enter
> > sync rep mode again, via the same process a new node would have done so.
>
> Well, actually, that's *considerably* more complicated than just a
> timeout. How are you going to "mark the standby as degraded"? The
> standby can't keep that information, because it's not even connected
> when the master makes the decision. ISTM that this requires
>
> 1. a unique identifier for each standby (not just role names that
> multiple standbys might share);
>
> 2. state on the master associated with each possible standby -- not just
> the ones currently connected.
>
> Both of those are perhaps possible, but the sense I have of the
> discussion is that people want to avoid them.
>
> Actually, #2 seems rather difficult even if you want it. Presumably
> you'd like to keep that state in reliable storage, so it survives master
> crashes. But how you gonna commit a change to that state, if you just
> lost every standby (suppose master's ethernet cable got unplugged)?
> Looks to me like it has to be reliable non-replicated storage. Leaving
> aside the question of how reliable it can really be if not replicated,
> it's still the case that we have noplace to put such information given
> the WAL-is-across-the-whole-cluster design.
I assumed we would have a parameter called "sync_rep_failure" that would
take a command and the command would be called when communication to the
slave was lost. If you restart, it tries again and might call the
function again.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2010-10-21 01:03:37 | Re: default_statistics_target WAS: max_wal_senders must die |
Previous Message | Bruce Momjian | 2010-10-20 23:26:39 | Re: queriing the version of libpq |