Quick Links

(Again) Datacorruption using 7.4.2 on XFS/raid1

From:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	(Again) Datacorruption using 7.4.2 on XFS/raid1
Date:	2004-07-12 18:31:15
Message-ID:	20040712183115.GA3913@foobar.solution-x.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

We have again experienced data-corruption using 7.4.2 on an XFS Filesystem
on top of a software-raid (md) raid-1.

After a server crash last night (It was a rather strange crash - The machine
was still pingable, but no login was possible, and postgres and apache
didn't respond to requests any more) we hard-reset the machine. It came up
again nicely, but a few hours later the following errors occured when trying
to access certain tabled. (Those tables are updated heavily - each day about
2 million tuples are inserted, and the old versions of those tuples
deleted).

ERROR: could not access status of transaction 34048
DETAIL: could not open file "/var/lib/postgres/data/pg_clog/0000": No such
file or directory

While reading linux-kernel today, I stumbled upon a description of a rather
strange XFS behaviour. It seems to zero a block if the block was updated,
and the corresponding metadata-update was flushed to disk, but not the data
itself.
It does not happen if the file is fsynced() after the update - but I was
wondering what would happen if the machine crashed between the write() and
the fsync().

The lkml thread about this can be found here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/0359.html

Could this XFS behaviour cause the postgres problems we are seeing?

greetings, Florian Pflug

Responses

Re: (Again) Datacorruption using 7.4.2 on XFS/raid1 at 2004-07-12 19:22:02 from Brian Hirt
Re: (Again) Datacorruption using 7.4.2 on XFS/raid1 at 2004-07-12 20:38:43 from Ian Barwick
Re: (Again) Datacorruption using 7.4.2 on XFS/raid1 at 2004-07-13 13:11:25 from Florian G. Pflug

Browse pgsql-general by date

	From	Date	Subject
Next Message	CSN	2004-07-12 18:52:59	Latitude/Longitude data types and functions
Previous Message	homecurr	2004-07-12 18:07:26	change the last bit