Re: Lost rows/data corruption?

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Andrew Hall <temp02(at)bluereef(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Lost rows/data corruption?
Date: 2005-02-16 12:46:55
Message-ID: Pine.LNX.4.61.0502161245020.18326@Megathlon.ESI
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 16 Feb 2005, Andrew Hall wrote:

> fsync is on for all these boxes. Our customers run their own hardware with
> many different specification of hardware in use. Many of our customers don't
> have UPS, although their power is probably pretty reliable (normal city based
> utilities), but of course I can't guarantee they don't get an outage once in
> a while with a thunderstorm etc.

I see. Well I can't help much, then, I don't run PG on XFS. I suggest testing
on a different FS, to exclude XFS problems. But with fsync on, the FS has
very little to do with reliability, unless it _lies_ about fsync(). Any
FS should return from fsync only after data is on disc, journal or not
(there might be issues with meta-data, but it's hardly a problem with PG).

It's more likely the hardware (IDE disks) lies about data being on plate.
But again that's only in case of sudden poweroffs.

[...]
> this condition. I'd be really surprised if XFS is the problem as I know there
> are plenty of other people across the world using it reliability with PG.

This is kind of OT, but I don't follow your logic here.

I don't see why plenty of success stories of XFS+PG suggest to you
the culprit is PG. To me it's still 50% - 50%. :-)

Moreover, XFS is continuosly updated (as it follows normal linux kernel
fast release cycle, like any other linux FS), so it's hard to make a
data point unless someone else is using _exactly_ the same versions as
you do.

For example, in kernel changelog from 2.6.7 to 2.6.10 you can read:

"[XFS] Fix a race condition in the undo-delayed-write buffer routine."

"[XFS] Fix up memory allocators to be more resilient."

"[XFS] Fix a possible data loss issue after an unaligned unwritten
extent write."

"[XFS] handle inode creating race"

(only a few of them)

Now, I don't have even the faintest idea if that might have affected you
or nor, but still the point is that the linux kernel changes a lot.
And vendors tend to customize their kernels a lot, too. On the PostreSQL
side, releases are slowly-paced, so it's easier.

Anyway, I agree your problem is weird, and that it must be something
on the server side.
No matter what you do on the client side (pool manager, JDBC driver,
servlets engige), in no way the DB should get corrupted with duplicated
primary keys.

I know this is a silly question, but when you write 'We do nothing with
any indexes' do you mean indeces are never, _never_ touched (I mean
explicitly, as in drop/create index), i.e. they are created at schema
creation time and then left alone? Just to make sure...

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo(at)ESI(dot)it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John DeSoi 2005-02-16 12:48:18 Re: insert data from an microsoft excel
Previous Message Daniel Naschenweng 2005-02-16 11:22:28 ERROR: cannot alter type of a column used by a view or rule