Re: FileFallocate misbehaving on XFS

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FileFallocate misbehaving on XFS
Date: 2024-12-09 10:06:13
Message-ID: 8aa1d1d7-645f-404b-a8f8-7c49be9acd27@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/9/24 08:34, Michael Harris wrote:
> Hello PG Hackers
>
> Our application has recently migrated to PG16, and we have experienced
> some failed upgrades. The upgrades are performed using pg_upgrade and
> have failed during the phase where the schema is restored into the new
> cluster, with the following error:
>
> pg_restore: error: could not execute query: ERROR: could not extend
> file "pg_tblspc/16401/PG_16_202307071/17643/1249.1" with
> FileFallocate(): No space left on device
> HINT: Check free disk space.
>
> This has happened multiple times on different servers, and in each
> case there was plenty of free space available.
>
> We found this thread describing similar issues:
>
> https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com
>
> As is the case in that thread, all of the affected databases are using XFS.
>
> One of my colleagues built postgres from source with
> HAVE_POSIX_FALLOCATE not defined, and using that build he was able to
> complete the pg_upgrade, and then switched to a stock postgres build
> after the upgrade. However, as you might expect, after the upgrade we
> have experienced similar errors during regular operation. We make
> heavy use of COPY, which is mentioned in the above discussion as
> pre-allocating files.
>
> We have seen this on both Rocky Linux 8 (kernel 4.18.0) and Rocky
> Linux 9 (Kernel 5.14.0).
>
> I am wondering if this bug might be related:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791323
>
>> When given an offset of 0 and a length, fallocate (man 2 fallocate) reports ENOSPC if the size of the file + the length to be allocated is greater than the available space.
>
> There is a reproduction procedure at the bottom of the above ubuntu
> thread, and using that procedure I get the same results on both kernel
> 4.18.0 and 5.14.0.
> When calling fallocate with offset zero on an existing file, I get
> enospc even if I am only requesting the same amount of space as the
> file already has.
> If I repeat the experiment with ext4 I don't get that behaviour.
>
> On a surface examination of the code paths leading to the
> FileFallocate call, it does not look like it should be trying to
> allocate already allocated space, but I might have missed something
> there.
>
> Is this already being looked into?
>

Sounds more like an XFS bug/behavior, so it's not clear to me what we
could do about it. I mean, if the filesystem reports bogus out-of-space,
is there even something we can do?

What is not clear to me is why would this affect pg_upgrade at all. We
have the data files split into 1GB segments, and the copy/clone/... goes
one by one. So there shouldn't be more than 1GB "extra" space needed.
Surely you have more free space on the system?

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-12-09 10:06:15 Re: Memory leak in WAL sender with pgoutput (v10~)
Previous Message Nisha Moond 2024-12-09 09:50:29 Re: Conflict detection for update_deleted in logical replication