Quick Links

PG on NFS may be just a bad idea

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org, pgsql-novice(at)postgreSQL(dot)org, Mija Lee <mija(at)scharp(dot)org>
Subject:	PG on NFS may be just a bad idea
Date:	2007-09-29 03:58:22
Message-ID:	25517.1191038302@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-docs pgsql-hackers pgsql-novice

I spent a bit of time tonight poking at the issue reported here:
http://archives.postgresql.org/pgsql-novice/2007-08/msg00123.php

It turns out to be quite easy to reproduce, at least for me: start CVS
HEAD on an NFS-mounted $PGDATA directory, and run the contrib regression
tests ("make installcheck" in contrib/). I see more than half of the
DROP DATABASE commands complaining in exactly the way Miya describes.
This failure rate might be an artifact of the particular environment
(I tested NFS client = Fedora Core 6, server = HPUX 10.20 on a much
slower machine) but the problem is clearly real.

In the earlier thread I cited suggestions that this behavior comes from
client programs holding files open longer than they should. However,
strace'ing this behavior shows no evidence at all that that is happening
in Postgres. I have an strace that shows conclusively that the bgwriter
never opened any file in the target database at all, and all earlier
backends exited before the one doing the DROP DATABASE began its dirty
work, and yet:

[pid 19211] 22:50:30.517077 rmdir("base/18193") = -1 ENOTEMPTY (Directory not empty)
[pid 19211] 22:50:30.517863 write(2, "WARNING: could not remove file "..., 79WARNING: could not remove file or directory "base/18193": Directory not empty
) = 79
[pid 19211] 22:50:30.517974 sendto(7, "N\0\0\0rSWARNING\0C01000\0Mcould not "..., 115, 0, NULL, 0) = 115

After some googling I think that the damage may actually be getting done
at the kernel level. According to
http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
it is fairly common for NFS clients to cache writes, meaning that the
kernel itself may be holding an old write and not sending it to the NFS
server until after the file deletion command has been sent.

(I don't have the network-fu needed to prove that this is happening by
sniffing the network traffic; anyone want to try?)

If this is what's happening I'd claim it is a kernel bug, but seeing
that I see it on FC6 and Miya sees it on Solaris 10, it would be a bug
widespread enough that we'd not be likely to get it killed off soon.

Maybe we need to actively discourage people from running Postgres
against NFS-mounted data directories. Shane Kerr's paper cited above
mentions some other rather scary properties, including O_EXCL file
creation not really working properly.

regards, tom lane

Responses

Re: PG on NFS may be just a bad idea at 2007-09-29 14:23:59 from Zdenek Kotala
Re: [HACKERS] PG on NFS may be just a bad idea at 2007-10-01 17:13:41 from Josh Berkus
Re: [HACKERS] PG on NFS may be just a bad idea at 2007-11-04 21:51:38 from Bruce Momjian

Browse pgsql-docs by date

	From	Date	Subject
Next Message	Zdenek Kotala	2007-09-29 14:23:59	Re: PG on NFS may be just a bad idea
Previous Message	Simon Riggs	2007-09-25 14:23:26	Re: when transaction commit begins

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Smith	2007-09-29 05:02:13	Re: Getting to 8.3 beta1
Previous Message	Brendan Jurd	2007-09-29 03:02:10	Re: [HACKERS] Add function for quote_qualified_identifier?

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Zdenek Kotala	2007-09-29 14:23:59	Re: PG on NFS may be just a bad idea
Previous Message	Tom Lane	2007-09-28 21:08:16	Re: Why is my view ddl being altered by postgres?