Re: Fwd: Re: A new look at old NFS readdir() problems?

From: David Steele <david(at)pgbackrest(dot)org>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Larry Rosenman <ler(at)lerctr(dot)org>, Pgsql hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <tmunro(at)freebsd(dot)org>
Subject: Re: Fwd: Re: A new look at old NFS readdir() problems?
Date: 2025-01-03 16:48:15
Message-ID: cb4d3296-3021-4d75-9de8-58f43a1e1d7b@pgbackrest.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/3/25 09:47, Greg Sabino Mullane wrote:
> On Fri, Jan 3, 2025 at 8:33 AM Robert Haas <robertmhaas(at)gmail(dot)com
> <mailto:robertmhaas(at)gmail(dot)com>> wrote:
>
> We tried to make our code as robust as it could be in the face of
> kernel code that behaved in a manner that was fairly ridiculous
> relative to our needs. This case doesn't seem that different to me.
>
>
> +1. Seems a shame that freebsd chooses such "optimizations", but making
> our code do various workarounds and jump through hoops to support
> various OS quirks (hello Win32 fans!) seems a burden we agreed to take
> on a long time ago.

FWIW, we observed this issue in pgBackRest a few years ago -- as you can
imagine we do a lot of scanning so readdir gets a real workout.

We had one issue reported [1] involving Alpine Linux and CIFS and
another [2] with SLES and NFS. We also had at least one internal report
that involved RHEL and a proprietary storage appliance. I'm not certain
if we received any reports for FreeBSD but it kind of rings a bell.

Over some time and various reports it seemed that any storage was
potentially a problem. I resisted the notion that we would have to work
around something that seemed to be an obvious kernel bug but in the end
I capitulated.

We fixed this by making a snapshot of each directory before performing
any operations on that directory (as has been suggested upthread). One
advantage we have is that our storage is very centralized since we deal
with a number of storage types so there are no readdirs in the general
code base. It was still a pretty major patch [3] but a lot of it was
removing the callbacks that we had used previously and adding
optimizations to reduce memory consumption.

One more thing to note -- we are still assuming that Postgres is running
on storage that is not subject to this issue. Even with our new
methodology if Postgres is deleting files while we are trying to build a
backup manifest that could cause us (and base backup) problems. The only
solution I came up with for that problem was to keep reading the
directory until we get two snapshots that match -- not very attractive
but probably workable for pgBackRest. I doubt the same could be said for
Postgres.

Regards,
-David

---

[1] https://github.com/pgbackrest/pgbackrest/issues/1754
[2] https://github.com/pgbackrest/pgbackrest/issues/1423
[3] https://github.com/pgbackrest/pgbackrest/commit/75623d45

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bernd Helmle 2025-01-03 16:55:26 Re: Modern SHA2- based password hashes for pgcrypto
Previous Message Sami Imseih 2025-01-03 16:24:16 Re: Logging parallel worker draught