Re: Strange issue with NFS mounted PGDATA on ugreen NAS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, Larry Rosenman <ler(at)lerctr(dot)org>, Pgsql hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Strange issue with NFS mounted PGDATA on ugreen NAS
Date: 2025-01-02 15:38:29
Message-ID: 267244.1735832309@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> I now suspect this specific readdir() problem is in FreeBSD's NFS
> client. See below. There have also been reports of missed files from
> (IIRC) Linux clients without much analysis, but that doesn't seem too
> actionable from here unless someone can come up with a repro or at
> least some solid details to investigate; those involved unspecified
> (possibly appliance/cloud) NFS and CIFS file servers.

I forgot to report back, but yesterday I spent time unsuccessfully
trying to reproduce the problem with macOS client and NFS server
using btrfs (a Synology NAS running some no-name version of Linux).
So that lends additional weight to your conclusion that it isn't
specifically a btrfs bug.

> I see this issue here with a FreeBSD client talking to a Debian server
> exporting BTRFS or XFS, even with dirreadsize set high so that
> multi-request paging is not expected. Looking at Wireshark and the
> NFS spec (disclaimer: I have never studied NFS at this level before,
> addito salis grano), what I see is a READDIR request with cookie=0
> (good), and which receives a response containing the whole directory
> listing and a final entry marker eof=1 (good), but then FreeBSD
> unexpectedly (to me) sends *another* READDIR request with cookie=662,
> which is a real cookie that was received somewhere in the middle of
> the first response on the entry for "13816_fsm", and that entry was
> followed by an entry for "13816_vm". The second request gets a
> response that begins at "13816_vm" (correct on the server's part).
> Then the client sends REMOVE (unlink) requests for some but not all of
> the files, including "13816_fsm" but not "13816_vm". Then it sends
> yet another READDIR request with cookie=0 (meaning go from the top),
> and gets a non-empty directory listing, but immediately sends RMDIR,
> which unsurprisingly fails NFS3ERR_NOTEMPTY. So my best guess so far
> is that FreeBSD's NFS client must be corrupting its directory cache
> when files are unlinked, and it's not the server's fault. I don't see
> any obvious problem with the way the cookies work. Seems like
> material for a minimised bug report elsewhere, and not our issue.

Yeah, that seems pretty damning. Thanks for looking into it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-02 16:15:32 Re: IWYU annotations
Previous Message Matheus Alcantara 2025-01-02 15:29:30 read stream on amcheck