Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> But more importantly, it can happen by accident. Someone trips on
> the power plug of the slave on Friday, and it goes unnoticed until
> Monday when DBA comes to work.
We've had people unplug things by accident exactly that way. :-/
We've also had replication across part of our WAN go down for the
better part of a day because a beaver chewed through a fiber optic
cable where it ran through a marsh. Our (application framework based)
replication just picks up where it left off, without any intervention,
when connectivity is restored. I think it would be a mistake to
design something less robust than that.
By the way, we don't use any state transitions for this, other than
keeping track of when we seem to have a working connection. The
client side knows what it last got, and when its reconnection attempts
eventually succeed it makes a request of the server side to provide a
stream of transactions from that point on. The response to that
request continues indefinitely, as long as the connection is up, which
can be months at a time.
-Kevin
"Everything should be made as simple as possible, but no simpler."
- Albert Einstein