| From: | Decibel! <decibel(at)decibel(dot)org> | 
|---|---|
| To: | David <wizzardx(at)gmail(dot)com> | 
| Cc: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: Database design: Data synchronization | 
| Date: | 2008-06-19 15:54:31 | 
| Message-ID: | D929C937-343F-416A-87BF-959EC730BEED@decibel.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On Jun 18, 2008, at 7:07 AM, David wrote:
> - Many foreign keys weren't enforced
>
> - Some fields needed special treatment (eg: should be unique, or
> behave like a foreign key ref, even if db schema doesn't specify it.
> In other cases they need to be updated during the migration).
>
> - Most auto-incrementing primary keys (and related foreign key
> references) needed to be updated during migration, because they are
> already used in the destination database for other records.
>
> - Many tables are undocumented, some fields have an unknown purpose
>
> - Some tables didn't have fields that can be used as a 'natural' key
> for the purpose of migration (eg: tables which only exist to link
> together other tables, or tables where there are duplicate records).
>
> I wrote a Python script (using SQLAlchemy and Elixir) to do the above
> for our databases.
>
> Are there any existing migration tools which could have helped with
> the above? (it would have required a *lot* of user help).
>
> Are there recommended ways of designing tables so that synchronization
> is easier?
>
> The main thing I've read about is ensuring that all records have a
> natural key of some kind, eg GUID. Also, your migration app needs to
> have rules for conflict resolution.
Well, it sounds like you've got a good list of what NOT to do. The  
first step is to make sure that you have a good database design,  
outside of replication considerations. Most tables should have  
natural unique keys; make sure you have FK's, documment things (see  
the COMMENT ON command), etc. If you have low data quality to start  
with, spreading that all over is just going to make things worse.
For the actual replication, there isn't really a multi-master  
solution for Postgres. Your best bet is to try and design the system  
so that you don't have conflicts (ie: if you have a bunch of branch  
offices, each one is responsible for their own data). You can then  
build something akin to multi-master using londiste and pgq.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Joshua D. Drake | 2008-06-19 15:55:40 | Re: Losing data | 
| Previous Message | Alvaro Herrera | 2008-06-19 15:18:42 | Re: Logging Parameter Values |