A Replication Idea

From: Orion Henry <orion(at)trustcommerce(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: A Replication Idea
Date: 2002-02-19 18:11:32
Message-ID: a4u4gi$2et0$1@jupiter.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


I've been thinking about replication and wanted to throw out an idea to see
how fast it gets torn apart. I'm sure the problem can't be this easy but I
can't think of why.

Ok... Let's say you have two fresh databases, both empty. You set up a
postgres proxy for them. The proxy works like this:

It listens on port 5432.
It pools connections to both real databases.
It is very simple just forwarding requests and responses back and
forth between client and server. A client can connect to
the proxy and not be able to tell that it is not an actual
postgres database.
When connections are made to it, it proxys connections to both
back-end databases.
If an insert/update/delete/DDL command comes, it forwards it to both
machines.
If a query comes down the line it forwards it to one machine or the
other.
If one of the machines goes offline or is not responding the proxy
queues up all update transactions intended for it and stops
forwarding queries to it until it comes back online and all
queued transactions have been committed.
A new machine can be inserted to the cluster. When the proxy is
alerted to this, it's first communication would be to
pgdumpall() one of the functional databases and pipe it to
the new one. At that moment, it is considered an
unreachable database and all update transactions are queued
for when the dump/rebuild is complete.
If a machine dies in catastrophic failure it can be removed from the
cluster, and once the machine is fixed, re-inserted as per
above.
If there were some SQL command for determining the load a machine
is experiencing the proxy could intelligently balance the
load to the machines in the cluster that can handle it.
If the proxy were to fail, clients could safely connect to one of
the back end databases in read-only mode until the proxy
came back up.
The proxy would store a log of incomplete transactions in some kind
of presistant storage for all the databases it's connected
to, so should it die, it can resume right where it left off
assuming the log is intact.

With the proxy set up like this you could connect to it as though it were a
database, upload your current data and schema and get most all the benifits
of clustering.

With this setup could achieve load balancing, fail-over, master-master
replication, master-slave replication, hot swap servers, dynamic addition
and removal of servers and HA-like clustering. The only thing it does not
do is partition data across servers. The only assumption I am aware of
that I am making is that two identical databases, given the same set of
arbitrary transactions will end up being the same. The only single point
of failure in this system would be the proxy itself. A modification to the
postgres client software could allow automatically fail-over to read-only
connections with one of the back-end databases. Also, the proxy could be
run on a router or other diskless system. I haven't really thought about
it, but it may even be possible to use current HA technology and run a pool
of failover proxy's.

If the proxy ended up NOT slowing the performance of a standalone,
single-system server, it could become the default connection method to
PostgreSQL such that a person could do an out-of-the-box install of the
database and a year later realize they really wanted a cluster, they could
hot-add a server without even restarting the database.

So, long story short, I'd like to get people's comments on this. If it
won't/can't work or has been tried before, I want to hear about it before I
start coding. I find it hard to believe that a replication/clusterings
solution could be this easy to implement but I can't think of why this
would not work.

Orion

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rich Shepard 2002-02-19 18:13:00 Re: creating tables from a disk file
Previous Message Jason Earl 2002-02-19 18:01:32 Re: PostgreSQL 7.2 Debian Package