Re: FileFallocate misbehaving on XFS

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Harris <harmic(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FileFallocate misbehaving on XFS
Date: 2024-12-10 17:36:40
Message-ID: CA+TgmobmZLYZHN+dQhdz=H2pwJb4DCdBabXxhq_ij9suLedApA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 9, 2024 at 7:31 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Pretty unexcited about all of these - XFS is fairly widely used for PG, but
> this problem doesn't seem very common. It seems to me that we're missing
> something that causes this to only happen in a small subset of cases.

I wonder if this is actually pretty common on XFS. I mean, we've
already hit this with at least one EDB customer, and Michael's report
is, as far as I know, independent of that; and he points to a
pgsql-general thread which, AFAIK, is also independent. We don't get
three (or more?) independent reports of that many bugs, so I think
it's not crazy to think that the problem is actually pretty common.
It's probably workload dependent somehow, but for all we know today it
seems like the workload could be as simple as "do enough file
extension and you'll get into trouble eventually" or maybe "do enough
file extension[with some level of concurrency and you'll get into
trouble eventually".

> I think the source of this needs to be debugged further before we try to apply
> workarounds in postgres.

Why? It seems to me that this has to be a filesystem bug, and we
should almost certainly adopt one of these ideas from Michael Harris:

- Providing a way to configure PG not to use posix_fallocate at runtime

- In the case of posix_fallocate failing with ENOSPC, fall back to
FileZero (worst case that will fail as well, in which case we will
know that we really are out of space)

Maybe we need some more research to figure out which of those two
things we should do -- I suspect the second one is better but if that
fails then we might need to do the first one -- but I doubt that we
can wait for XFS to fix whatever the issue is here. Our usage of
posix_fallocate doesn't look to be anything more than plain vanilla,
so as between these competing hypotheses:

(1) posix_fallocate is and always has been buggy and you can't rely on it, or
(2) we use posix_fallocate in a way that nobody else has and have hit
an incredibly obscure bug as a result, which will be swiftly patched

...the first seems much more likely.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-12-10 17:55:41 Re: Track the amount of time waiting due to cost_delay
Previous Message Tom Lane 2024-12-10 17:14:50 Re: Assert failure on running a completed portal again