From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Michael Meier <mikem934(at)googlemail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: How to design for cheap reliability with PostgreSQL? |
Date: | 2008-12-02 11:59:36 |
Message-ID: | 49352328.3010802@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Michael Meier wrote:
> Hi,
>
> I am in the process of designing a distributed application (Java EE 5)
> which has demand for database functionality at several points. Some of
> these databases need to provide really mission critical reliability,
> i.e. it would be the end of our world, if we would suffer from data
> loss or silent data corruption. Availabilty or servicability are nice
> but not that important. This means that it isn't that important that a
> transaction is committed, but if the transaction is committed, the
> data should be 100% safe - without any chance of data loss or
> corruption.
I'm assuming you mean 99.99..9% safe.
> On the other hand, real proper reliability is expensive and we do not
> have the money for serious hardware or "unbreakable" database
> licences. I assume a DB size of < 50GB with moderate write I/O and
> less moderate read I/O, so I would like to get a machine with 64 GB
> Ram for in-memory caching.
An "unbreakable" database isn't necessarily going to help you with the
sort of errors you're thinking of.
> I am thinking of PostgreSQL 8.3.n with n>=5 on top of Solaris 10 with ZFS.
Not sure if it's of interest, but there's been discussion of block-level
checksums for 8.4. You might want to check the archives for the -hackers
mailing list. I've never used ZFS, but from what I've read it would make
sense.
> So an extremely cheap low end hardware platform for the problem might
> look like this:
You and I are at different scales of "cheap" :-)
> Dual Quad Core Xeon Machine
> 64 GB FB-DIMM ECC-Ram
> Two 8-Port SAS Controller
> Internal 16 x 73 GB 15k hard Drives, partitioned as follows:
> - 4 drives: 3-way raid-1 zfs mirror with one additional hot spare for Solaris
> - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
> Postgresql Tables
> - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
> Postgresql Logging
That's a lot of disks for a 50GB database, unless you've got a *lot* of
writes.
> ZFS-Scrubbing every night.
> approx. 24.000$ list price from Sun (x4250) with Sun Platinum support
> for the hardware and Solaris,
> or approx. 12.000$ self-assembled from newegg.
Well, several of either must be preferable to your expensive version.
> And an extremely expensive high-end solution for the problem might
> look like this:
> Sun SPARC Enterprise M4000 Server with two SPARC64 VII CPUs
> 64GB registered ECC-Ram
> Sun M-Series RAS-Features (Checksumming of CPU-Registers, etc.)
> An external Drive Array (e.g. J4400) with 16 drives. Partioned like
> the internal drives above.
> ZFS-Scrubbing every night.
> approx. 160.000$
>
> This insane price difference would get us data integrity beyond the
> ZFS checksumming feature: It would protect the data even in the CPU
> registers.
>
> So the questions are:
>
> 1) Is this necessary? Do bit errors happen with configurations like
> the cheap xeon one above?
You'll probably want to estimate the odds of earthquake/volcano/nuclear
war and see how much effort it's worth.
> 2) If this is necessary, is there any (software) way around it? Is
> there a PostgreSQL clustering solution available, where the cluster
> nodes check each others data integrity?
Hmm - there are commercial variants of PG that do clustering/replication
(Greenplum and EnterpriseDB for a start) but I'm not sure if it's quite
what you're after.
I don't know what sort of queries you're running, but you could do
something like:
- pg-pool to send queries to two front-end servers
- both replicate to a third server over slony, which compares the
replicated copies (synchronisation issues here though)
- you keep the replication stream or keep separate WAL archives so if a
problem is found you stop everything and rewind until you get to a known
good point.
Of course, all this just introduces more code with more chances of a
bug. It would *look* more reliable, but I'm not sure it would be.
HTH
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2008-12-02 12:00:47 | Re: pg_xlog content |
Previous Message | Raymond O'Donnell | 2008-12-02 11:58:46 | Re: Job scheduling in Postgre |