From: | William Yu <wyu(at)talisys(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Postgresql replication |
Date: | 2005-08-25 12:24:53 |
Message-ID: | dekdak$1v8e$1@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Another tidbit I'd like to add. What has helped a lot in implementing
high-latency master-master replication writing our software with a
business process model in mind where data is not posted directly to the
final tables. Instead, users are generally allowed to enter anything --
could be incorrect, incomplete or the user does not have rights -- the
data is still dumped into "pending" tables for people with rights to
fix/review/approve later. Only after that process is the data posted to
the final tables. (Good data entered on the first try still gets pended
-- validation phase simply assumes the user who entered the data is also
the one who fixed/reviewed/approved.)
In terms of replication, this model allows for users to enter data on
any server. The pending records then get replicated to every server.
Each specific server then looks at it's own dataset of pendings to post
to final tables. Final data is then replicated back to all the
participating servers.
There may be a delay for the user if he/she is working on a server that
doesn't have rights to post his data. However, the pending->post model
gets users used to the idea of (1) entering all data in large swoop and
validating/posting it afterwards and (2) data can/will sit in pending
for a period of time until it is acted upon with somebody/some server
with the proper authority. Hence users aren't expecting results to pop
up on the screen the moment they press the submit button.
William Yu wrote:
> Yes, it requires a lot foresight to do multi-master replication --
> especially across high latency connections. I do that now for 2
> different projects. We have servers across the country replicating data
> every X minutes with custom app logic resolves conflicting data.
>
> Allocation of unique IDs that don't collide across servers is a must.
> For 1 project, instead of using numeric IDs, we using CHAR and
> pre-append a unique server code so record #1 on server A is A0000000001
> versus ?x0000000001 on other servers. For the other project, we were too
> far along in development to change all our numerics into chars so we
> wrote custom sequence logic to divide our 10billion ID space into
> 1-Xbillion for server 1, X-Ybillion for server 2, etc.
>
> With this step taken, we then had to isolate (1) transactions could run
> on any server w/o issue (where we always take the newest record), (2)
> transactions required an amalgam of all actions and (3) transactions had
> to be limited to "home" servers. Record keeping stuff where we keep a
> running history of all changes fell into the first category. It would
> have been no different than 2 users on the same server updating the same
> object at different times during the day. Updating of summary data fell
> into category #2 and required parsing change history of individual
> elements. Category #3 would be financial transactions requiring strict
> locks were be divided up by client/user space and restricted to the
> user's home server. This case would not allow auto-failover. Instead, it
> would require some prolonged threshold of downtime for a server before
> full financials are allowed on backup servers.
From | Date | Subject | |
---|---|---|---|
Next Message | William Yu | 2005-08-25 12:34:46 | Re: Postgresql replication |
Previous Message | William Yu | 2005-08-25 12:03:24 | Re: Postgresql replication |