From: | Alvaro Herrera <alvherre(at)atentus(dot)com> |
---|---|
To: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-general(at)postgresql(dot)org>, <vmikheev(at)SECTORBASE(dot)COM> |
Subject: | Re: Database corruption? |
Date: | 2001-10-31 01:20:31 |
Message-ID: | Pine.LNX.4.33L2.0110302216150.16301-100000@aguila.protecne.cl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, 31 Oct 2001, Tatsuo Ishii wrote:
> > It may be unthinkable hubris to say this, but ... I am starting to
> > notice that a larger and larger fraction of serious trouble reports
> > ultimately trace to hardware failures, not software bugs. Seems we've
> > done a good job getting data-corruption bugs out of Postgres.
> >
> > Perhaps we should reconsider the notion of keeping CRC checksums on
> > data pages. Not sure what we could do to defend against bad RAM,
> > however.
Maybe not defend against it, but at least you can detect and warn the
user that something is likely to go wrong.
> I have been troubled by a really strange problem. Populating with huge
> data (~7GB) cause random failures, for example a misterious unique
> constaraint violation, count(*) shows incorrect number, pg_temp*
> suddenly disappear (the table in question is a temporary table).
Remember the guy who had to change relnatts by hand to get a table back
on line? It was bad RAM. One may wonder just how big the coincidence was
to get exactly that bit changed... Well, a bad CRC checksum would've
warned him right away.
--
Alvaro Herrera (<alvherre[(at)]atentus(dot)com>)
"Si quieres ser creativo, aprende el arte de perder el tiempo"
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2001-10-31 01:30:28 | Re: Database corruption? |
Previous Message | danh | 2001-10-31 01:16:29 | fresh install of postgres 7.1 doesn't start postmaster with the "-i" flag |