From: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> |
---|---|
To: | postgres bee <postgres_bee(at)live(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5055: Invalid page header error |
Date: | 2009-09-16 00:23:27 |
Message-ID: | 4AB02FFF.1050408@postnewspapers.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
postgres bee wrote:
> Correct me if I am wrong, but I thought one of the, if not the most, primary tasks for relational databases is to ensure that no data loss ever occurs. Which is why I was initially surprised that the issue did not get enough importnace. But now it seems more like the community not knowing what triggered the issue i.e. not knowing which component to fix.
... or if there is anything to fix.
PostgreSQL has to trust the hardware and the OS to do their jobs. If the
OS is, unbeknownst to PostgreSQL, flipping the high bit in any byte
written at exactly Midnight on a tuesday, there's nothing PostgreSQL can
do to prevent it.
If Pg checksummed blocks and read each block back after writing it could
possibly detect *immediate* write problems - but then, OS caching would
probably hide the issue unless Pg bypassed the OS's caching and forced
direct disk reads. To say this would perform poorly is a spectacular
understatement.
Even with such a scheme, there's no guarantee that data isn't being
mangled after it hits disk. The RAID controller might be "helpfully"
"fixing" parity errors in a RAID 5 volume using garbage being returned
by a failing disk during periodic RAID scrubbing. An SSD might have a
buggy wear leveling algorithm that results in blocks being misplaced.
And so on.
Now, in those cases you might see some benefit from an OS-level
checksumming file system, but that won't save you from OS bugs.
It's the OS and the hardware's job to get the data Pg writes to disk
onto disk successfully and accurately, keep it there, and return it
unchanged on request. If they can't do that, there's nothing Pg can do
about it.
> But I do have one overriding question - since postgres is still running on the same hardware, wouldn't it rule out hardware as the primary suspect?
Absolutely not. As Tom Lane noted, such faults are generally intermittent.
For example: I had lots of "fun" years ago tracking down an issue caused
by RAID scrubbing on a defective 3Ware 8500-8 card. The card ran fine in
all my tests, and the system would remain in good condition for a week
or two, but then random file system corruption would start arising.
Files would be filled with garbage or with the contents of other files,
the file system structure would get damaged and need fsck, files would
vanish or turn up in lost+found, etc etc. It turned out that by default
the controller ran a weekly parity check - which was always failing due
to an defect with the controller, triggering a rebuild. The rebuild, due
to the same issue with the controller, would proceed to merrily mangle
the data on the array in the name of restoring parity.
3Ware replaced the controller and all was well.
Now, what's PostgreSQL going to do when it's run on hardware like that?
How can it protect its self?
It can't.
Common causes of intermittent corruption include:
- OS / file system bugs
- Buggy RAID drivers and cards, especially "fake raid" cards
- Physically defective or failing hardware RAID cards
- Defective or overheating memory / CPU, resulting in intermittent
memory corruption that can affect data written to or read from disk.
Doesn't always show up as crashing processes etc as well; such things
can be REALLY quirky.
- Some desktop hard disks, which are so desperate to ensure you don't
return them as defective that they'll do scary things to remap blocks.
"Meh, it was unreadable anyway, I'll just re-allocate it and return
zeroes instead of reporting an error"
Sometimes bugs will only arise in certain circumstances. A RAID
controller bug might only be triggered by a Western Digital "Green" hard
disk with a 1.0 firmware*. An issue with a 2.5" laptop SSD might only
arise when a write is committed to it immediately before it's powered
off as a laptop goes into sleep. A buggy disk might perform an
incomplete write of a block if power from the PSU momentarily drops
below optimal levels because someone turned on the microwave on the same
phase as the server. The list is endless.
What it comes down to, though, is that this issue manifests its self
first as some corrupt blocks in one of the database segments. There's
absolutely no information available about when they got corrupted or by
what part of the system. It could've even been anti-virus software on
the system "disinfecting" them from a suspected virus, ie something
totally outside the normal parts of the system Pg is concerned with. So,
unless an event is noticed that is associated with the corruption, or
some way to reproduce it is found, there's no way to tell whether any
given incident could be a rarely triggered Pg bug (ie: Pg writes wrong
data, writes garbage to files, etc) or whether it's something external
like hardware or interfering 3rd party software.
Make sense?
--
Craig Ringer
* For example, WD Caviar disks a few years ago used to spin down without
request from the OS as a power saving measure. This was Ok with most
OSes, but RAID cards tended to treat them as failed and drop them from
the array. Multiple disk failure array death quickly resulted. Yes, I
had some of those, too - I haven't been lucky with disks.
--
Craig Ringer
From | Date | Subject | |
---|---|---|---|
Next Message | Ron Mayer | 2009-09-16 00:49:38 | Re: BUG #5055: Invalid page header error |
Previous Message | Mark Kirkwood | 2009-09-15 23:29:20 | Re: BUG #5054: PDO -> Query returns "" from Boolean type field, if it has false value. |