Quick Links

Re: Replication Ideas

From:	Chris Travers <chris(at)travelamericas(dot)com>
To:	Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Replication Ideas
Date:	2003-08-25 17:06:22
Message-ID:	3F4A420E.6090604@travelamericas.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers pgsql-performance

Ron Johnson wrote:

>This is vaguely similar to Two Phase Commit, which is a sine qua
>non of distributed transactions, which is the s.q.n. of multi-master
>replication.
>
>
>

I may be wrong, but if I recall correctly, one of the problems with a
standard 2-phase commit is that if one server goes down, the other
masters cannot commit their transactions. This would make a clustered
database server have a downtime equivalent to the total downtime of all
of its nodes. This is a real problem. Of course my understanding of
Two Phase Commit may be incorrect, in which case, I would appreciate it
if someone could point out where I am wrong.

It had occurred to me that the issue was one of failure handling more
than one of concept. I.e. the problem is how one node's failure is
handled rather than the fundamental structure of Two Phase Commit. If a
single node fails, we don't want that to take down the whole cluster,
and I have actually revised my logic a bit more (to make it even
safer). In this I assume that:

1: General failures on any one node are rare
2: A failure is more likely to prevent a transaction from being
committed than allow one to be committed.

This hot-failover solution requires a transparency from a client
perspective-- i.e. the client should not have to choose a different
server should one go and should not need to know when a server comes
back up. This also means that we need to assume that a load balancing
solution can be a part of the clustering solution. I would assume that
this would require a shared IP address for the public interface of the
server and a private communicatiions channel where each node has a
separate IP address (similar to Microsoft's implimentation of Network
Load Balancing). Also, different transactions within a single
connection should be able to be handled by different nodes, so if one
node goes down, users don't have to reconnect.

So here is my suggested logic for high availablility/load balanced
clustering:

1: All nodes recognize each user connection and delegage transactions
rather than connections.

2: At the beginning of a transaction, nodes decide who will take it.
Any operation which does not change the information or schema of the
database is handled exclusively on that node. Other operations are
distributed across nodes.

3: When the transaction is committed, the nodes "vote" on whether the
commitment of the transaction is valid. Majority rules, and the minority
must remove themselves from the cluster until they can synchronize their
databases with the existing masters. If the vote is split 50/50 (i.e.
one node fails in a 2 node cluster), success is considered more likely
to be valid than failure, and the node(s) which failed to commit the
transaction must remove themselves from the cluster until they can recover.

Best Wishes,
Chris Travers

In response to

Re: Replication Ideas at 2003-08-24 06:13:08 from Ron Johnson

Responses

Re: Replication Ideas at 2003-08-25 17:38:16 from Ron Johnson
Re: Replication Ideas at 2003-08-25 18:24:41 from Alvaro Herrera

Browse pgsql-general by date

	From	Date	Subject
Next Message	Patrick Hatcher	2003-08-25 17:30:00	Sales numbers off hold off using please.
Previous Message	Chris M	2003-08-25 16:28:44	Re: export data from postgresql on linux to ms-sqlserver 2000 on windows

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ron Johnson	2003-08-25 17:38:16	Re: Replication Ideas
Previous Message	Peter Eisentraut	2003-08-25 16:44:33	NOTICE vs WARNING

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ron Johnson	2003-08-25 17:38:16	Re: Replication Ideas
Previous Message	Shridhar Daithankar	2003-08-25 16:35:05	Re: Query too slow