Re: Fwd: Re: A new look at old NFS readdir() problems?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Larry Rosenman <ler(at)lerctr(dot)org>, Pgsql hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fwd: Re: A new look at old NFS readdir() problems?
Date: 2025-01-02 23:08:50
Message-ID: CA+hUKG+rBEN5Z29AGHHGTaKa35nQugCQuGoOSQWGQNz=Rc8_nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 3, 2025 at 10:53 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> (To be clear: if this is how FreeBSD acts, then I'm afraid we already
> do have such bugs. The rmtree case is just easier to observe than a
> missed fsync.)

For what little it's worth, I'm not quite convinced yet that FreeBSD's
client isn't more broken than it needs to be. Lots of systems I
looked at have stable cookies in practice (as NFS 4 recommends),
including the one used in this report, so it seems like a more basic
problem. At the risk of being wrong on the internet, I don't see any
fundamental reason why a readdir() scan can't have no-skip,
no-duplicate, no-fail semantics for stable-cookie, no-verification
servers. And this case works perfectly with a couple of other NFS
clients implementations that you and I tried.

As for systems that don't have stable cookies, well then they should
implement the cookie verification scheme and AFAICS the readdir() scan
should then fail if it can't recover, or it would expose isolation
anomalies offensive to database hacker sensibilities. I think it
should be theoretically possible to recover in some happy cases
(maybe: when enough state is still around in cache to deduplicate).
But that shouldn't be necessary on filers using eg ZFS or BTRFS whose
cookies are intended to be stable. A server could also do MVCC magic,
and from a quick google, I guess NetApp WAFL might do that as it has
the concept of "READDIR expired", which smells a bit like ORA-01555:
snapshot too old.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-02 23:20:15 Re: Fwd: Re: A new look at old NFS readdir() problems?
Previous Message Tom Lane 2025-01-02 23:05:13 Re: Fwd: Re: A new look at old NFS readdir() problems?