Re: (Again) Datacorruption using 7.4.2 on XFS/raid1

From: Brian Hirt <bhirt(at)mobygames(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: (Again) Datacorruption using 7.4.2 on XFS/raid1
Date: 2004-07-12 19:22:02
Message-ID: C0C38A2F-D438-11D8-9804-000D93AD2E74@mobygames.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

FYI, I have seen the SW linux raid not detect failed drives and cause
filesystem corruption on many occasions. I would reccomend staying
away from it. Maybe what you describe is a problem with PG but, i
doubt it.

On Jul 12, 2004, at 12:31 PM, Florian G. Pflug wrote:

> Hi
>
> We have again experienced data-corruption using 7.4.2 on an XFS
> Filesystem
> on top of a software-raid (md) raid-1.
>
> After a server crash last night (It was a rather strange crash - The
> machine
> was still pingable, but no login was possible, and postgres and apache
> didn't respond to requests any more) we hard-reset the machine. It
> came up
> again nicely, but a few hours later the following errors occured when
> trying
> to access certain tabled. (Those tables are updated heavily - each day
> about
> 2 million tuples are inserted, and the old versions of those tuples
> deleted).
>
> ERROR:  could not access status of transaction 34048
> DETAIL:  could not open file "/var/lib/postgres/data/pg_clog/0000": No
> such
> file or directory
>
> While reading linux-kernel today, I stumbled upon a description of a
> rather
> strange XFS behaviour. It seems to zero a block if the block was
> updated,
> and the corresponding metadata-update was flushed to disk, but not the
> data
> itself.
> It does not happen if the file is fsynced() after the update - but I
> was
> wondering what would happen if the machine crashed between the write()
> and
> the fsync().
>
> The lkml thread about this can be found here:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/0359.html
>
> Could this XFS behaviour cause the postgres problems we are seeing?
>
> greetings, Florian Pflug
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend

In response to

Browse pgsql-general by date

  From Date Subject
Next Message CSN 2004-07-12 19:40:44 Re: make install (in contrib) and PGDATA
Previous Message Tony Reina 2004-07-12 19:11:01 Can connection pointer be obtained from PGresult?