From: | Jan Wieck <JanWieck(at)Yahoo(dot)com> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Theo Schlossnagle <jesus(at)omniti(dot)com>, Jim Nasby <decibel(at)decibel(dot)org> |
Subject: | Re: Proposal: Commit timestamp |
Date: | 2007-02-04 15:06:27 |
Message-ID: | 45C5F673.2070307@Yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/4/2007 3:16 AM, Peter Eisentraut wrote:
> Jan Wieck wrote:
>> This is all that is needed for last update wins resolution. And as
>> said before, the only reason the clock is involved in this is so that
>> nodes can continue autonomously when they lose connection without
>> conflict resolution going crazy later on, which it would do if they
>> were simple counters. It doesn't require microsecond synchronized
>> clocks and the system clock isn't just used as a Lamport timestamp.
>
> Earlier you said that "one assumption is that all servers in the
> multimaster cluster are ntp synchronized", which already rung the alarm
> bells in me. Now that I read this you appear to require
> synchronization not on the microsecond level but on some level. I
> think that would be pretty hard to manage for an administrator, seeing
> that NTP typically cannot provide such guarantees.
Synchronization to some degree is wanted to avoid totally unexpected
behavior. The conflict resolution algorithm itself can perfectly fine
live with counters, but I guess you wouldn't want the result of it. If
you update a record on one node, then 10 minutes later you update the
same record on another node. Unfortunately, the nodes had no
communication and because the first node is much busier, its counter is
way advanced ... this would mean the 10 minutes later update would get
lost in the conflict resolution when the nodes reestablish
communication. They would have the same data at the end, just not what
any sane person would expect.
This behavior will kick in whenever the cross node conflicting updates
happen close enough so that the time difference between the clocks can
affect it. So if you update the logical same row on two nodes within a
tenth of a second, and the clocks are more than that apart, the conflict
resolution can result in the older row to survive. Clock synchronization
is simply used to minimize this.
The system clock is used only to keep the counters somewhat synchronized
in the case of connection loss to retain some degree of "last update"
meaning. Without that, continuing autonomously during a network outage
is just not practical.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #
From | Date | Subject | |
---|---|---|---|
Next Message | David Fetter | 2007-02-04 15:45:28 | Re: [HACKERS] writing new regexp functions |
Previous Message | Magnus Hagander | 2007-02-04 12:02:38 | Re: libpq docs about PQfreemem |