From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Aidan Van Dyk <aidan(at)highrise(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Issues with Quorum Commit |
Date: | 2010-10-08 08:18:23 |
Message-ID: | 4CAED3CF.7090503@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 08.10.2010 01:25, Simon Riggs wrote:
> On Thu, 2010-10-07 at 13:44 -0400, Aidan Van Dyk wrote:
>
>> To get "non-stale" responses, you can only query those k=3 servers.
>> But you've shot your self in the foot because you don't know which
>> 3/10 those will be. The other 7 *are* stale (by definition). They
>> talk about picking the "caught up" slave when the master fails, but
>> you actually need to do that for *every query*.
>
> There is a big confusion around that point and I need to point out that
> statement isn't accurate. It's taken me a long while to understand this.
>
> Asking for k> 1 does *not* mean those servers are time synchronised.
> All it means is that the master will stop waiting after 3
> acknowledgements. There is no connection between the master receiving
> acknowledgements and the standby applying changes received from master;
> the standbys are all independent of one another.
>
> In a bad case, those 3 acknowledgements might happen say 5 seconds apart
> on the worst and best of the 3 servers. So the first standby to receive
> the data could have applied the changes ~4.8 seconds prior to the 3rd
> standby. There is still a chance of reading stale data on one standby,
> but reading fresh data on another server. In most cases the time window
> is small, but still exists.
>
> The other 7 are stale with respect to the first 3. But then so are the
> last 9 compared with the first one. The value of k has nothing
> whatsoever to do with the time difference between the master and the
> last standby to receive/apply the changes. The gap between first and
> last standby (i.e. N, not k) is the time window during which a query
> might/might not see a particular committed result.
>
> So standbys are eventually consistent whether or not the master relies
> on them to provide an acknowledgement. The only place where you can
> guarantee non-stale data is on the master.
Yes, that's a good point. Synchronous replication for load-balancing
purposes guarantees that when *you* perform a commit, after it finishes
it will be visible in all standbys. But if you run the same query across
different standbys, you're not guaranteed get same results. If you just
pick a random server for every query, you might even see time moving
backwards. Affinity is definitely a good idea for the load-balancing
scenario, but even then the anomaly is possible if you get re-routed to
a different server because the one you were bound to dies.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Leonardo Francalanci | 2010-10-08 08:20:25 | Re: I: About "Our CLUSTER implementation is pessimal" patch |
Previous Message | Markus Wanner | 2010-10-08 08:16:08 | Re: Issues with Quorum Commit |