Re: Replication Using Triggers

From: gordan(at)bobich(dot)net
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Replication Using Triggers
Date: 2008-01-18 16:09:45
Message-ID: alpine.LRH.1.00.0801181553430.21203@skynet.shatteredsilicon.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 18 Jan 2008, Erik Jones wrote:

>> Is there any reason why PostgreSQL replication solutions are all add-on 3rd
>> party ones?
>
> Because no one solution would be appropriate for everyone. The core team and
> contributors feel that their time is better spent on the database itself
> rather than developing and maintaining multiple different replication
> solutions and dealing with the support thereof. What has been done is to add
> some extra hooks in 8.3 for replication triggers that can help to specialize
> when/if a given trigger fires.

Hmm, selective trigger firing sounds interesting.

>> Is there any reason why replication couldn't be implemented using triggers
>> and a handful of stored procedures?
>
> That's usually how it's done. Well, plus some external user-land application
> libraries.

That's just it - I don't think any user-land libraries would actually be
required. One of supposed big advantages of MySQL is it's straightforward
replication support. It's quite painful to see PostgreSQL suffer purely
for the sake of lack of marketting in this department. :-(

>> This is what I have in mind:
>>
>> Have a plperl function that creates connections to all servers in the
>> cluster (replication partners), and issues the supplied write query to
>> them, possibly with a tag of some sort to indicated it is a replicated
>> query (to prevent circular replication).
>>
>> Have a post execute trigger that calls the above replication function if
>> the query was issued directly (as opposed to replicated), and passes it the
>> query it just executed if it was successful.
>>
>> If the replication failed on any node, the whole thing gets rolled back.
>
> That sounds pretty brittle. Do you really want all progress in your
> databases to stop if there is a network issue to a single server?

1) That's what MySQL does (it either ignores errors or stops replication
on encountering an error, which of those two it does is selectable, but
that's about it).

2) If there is a network/server issue, that would be detected because the
$dbh would break. If the $dbh breaks, then plperl can either attempt to
re-connect, or failing that, boot the node from the replication cluster
(could have a node list with active/fenced flag in a separate config
schema). Neither approach would be difficult to implement.

Then it could just not bother reconnecting the fenced node until the user
updates the node status in the said config schema table. Configuration
schema could also be replicated.

>> This would effectively give star topology synchronous replication with very
>> little effort, and no need for any external code. Am I missing something
>> obvious that would prevent this from working? It would give replication
>> capabilities better than MySQL's (which can only handle ring based
>> multi-master replication) for the sake of about 100 lines of code. None of
>> the required functionality required is new to PostgreSQL, either.
>>
>> Is there an existing implementation of this? Perhaps a perl program that
>> creates the required triggers and stored procedures from looking at a
>> schema?
>
> What you've described here would be pretty simple to implement. However, I
> think you've greatly underestimated the performance issues involved. If you
> need to push data to multiple databases before each transaction commits I
> think you'll find that pretty slow.

Only if transactions are used. I'm basing the requirements on "at least as
good as MySQL", which this would meet without transactions. If
transactions are wanted they could be enabled, otherwise it could just be
fire and forget asynchronous replication a-la MySQL. Having a choice
between transactions and speed is good. :-)

One thing I haven't quite thought of a good way to do with this approach
is the equivalent of the useful (albeit deprecated) LOAD DATA FROM MASTER
command, that gets the server in sync by dropping and re-loading all the
tables from the master(s) (or rather, peers in a multi-master star
replication), and enables it in the replication. It would be neater than
requiring downtime or global write locks. But I guess that could wait
until version 2. :)

> That's why most of the available third
> party solutions are asynchronous. The biggest options are out there are
> Slony and Londiste (both master-slave, asynchronous) and Bucardo
> (asynchronous, but supports both master-master and master-slave) which, as
> you would have it, is written in Perl.

I looked at all of the above, and they all seemed (to meat least) to
involve unnecessary complication or limitations I saw as unreasonable (or
both). I looked at Bucardo in detail, and I was rather disappointed to see
that it only supports two master nodes at the moment.

Thanks.

Gordan

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-01-18 16:18:32 Re: Forgot to dump old data before re-installing machine
Previous Message Dave Page 2008-01-18 16:07:24 Re: Forgot to dump old data before re-installing machine