From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> |
Cc: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: 2-phase commit |
Date: | 2003-09-27 09:34:34 |
Message-ID: | 200309271034.34880.dev@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Saturday 27 September 2003 06:59, Tom Lane wrote:
> Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> >> ... You can make this work, but the resource costs
> >> are steep.
> >
> > So, after 'n' seconds of waiting, we abandon the slave and the slave
> > abandons the master.
>
> [itch...] But you surely cannot guarantee that the slave and the master
> time out at exactly the same femtosecond. What happens when the comm
> link comes back online just when one has timed out and the other not?
> (Hint: in either order, it ain't good. Double plus ungood if, say, the
> comm link manages to deliver the master's "commit confirm" message a
> little bit after the master has timed out and decided to abort after all.)
>
> In my book, timeout-based solutions to this kind of problem are certain
> disasters.
I might be (well, am actually) a bit out of my depth here, but surely what
happens is if you have machines A,B,C and *any* of them thinks machine C has
a problem then it does. If C can still communicate with the others then it is
told to reinitialise/go away/start the sirens. If C can't communicate then
it's all a bit academic.
Granted, if you have intermittent problems on a link and set your timeouts
badly then you'll have a very brittle system, but if A thinks C has died, you
can't just reverse that decision.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Brown | 2003-09-27 10:00:21 | Re: invalid tid errors in latest 7.3.4 stable. |
Previous Message | Peter Eisentraut | 2003-09-27 09:31:04 | Re: initdb failure (was Re: [GENERAL] sequence's plpgsql) |