Re: FileFallocate misbehaving on XFS

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Michael Harris <harmic(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: FileFallocate misbehaving on XFS
Date: 2024-12-09 16:15:46
Message-ID: qhy5z65zhfui5b7vmwkqclbu7aksdvdkohxnb3bgzflvrnhugv@vy3pyzwpm3uv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-12-09 15:47:55 +0100, Tomas Vondra wrote:
> On 12/9/24 11:27, Jakub Wartak wrote:
> > On Mon, Dec 9, 2024 at 10:19 AM Michael Harris <harmic(at)gmail(dot)com
> > <mailto:harmic(at)gmail(dot)com>> wrote:
> >
> > Hi Michael,
> >
> > We found this thread describing similar issues:
> >
> > https://www.postgresql.org/message-id/flat/
> > AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com <https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com>
> >
> >
> > We've got some case in the past here in EDB, where an OS vendor has
> > blamed XFS AG fragmentation (too many AGs, and if one AG is not having
> > enough space -> error). Could You perhaps show us output of on that LUN:
> > 1. xfs_info
> > 2. run that script from https://www.suse.com/support/kb/doc/?
> > id=000018219 <https://www.suse.com/support/kb/doc/?id=000018219> for
> > Your AG range
> >
>
> But this can be reproduced on a brand new filesystem - I just tried
> creating a 1GB image, create XFS on it, mount it, and fallocate a 600MB
> file twice. Which that fails, and there can't be any real fragmentation.

If I understand correctly xfs, before even looking at the file's current
layout, checks if there's enough free space for the fallocate() to
succeed. Here's an explanation for why:
https://www.spinics.net/lists/linux-xfs/msg55429.html

The real problem with preallocation failing part way through due to
overcommit of space is that we can't go back an undo the
allocation(s) made by fallocate because when we get ENOSPC we have
lost all the state of the previous allocations made. If fallocate is
filling holes between unwritten extents already in the file, then we
have no way of knowing where the holes we filled were and hence
cannot reliably free the space we've allocated before ENOSPC was
hit.

I.e. reserving space as you go would leave you open to ending up with some,
but not all, of those allocations having been made. Whereas pre-reserving the
worst case space needed, ahead of time, ensures that you have enough space to
go through it all.

You can't just go through the file [range] and compute how much free space you
will need allocate and then do the a second pass through the file, because the
file layout might have changed concurrently...

This issue seems independent of the issue Michael is having though. Postgres,
afaik, won't fallocate huge ranges with already allocated space.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-12-09 16:16:00 Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
Previous Message jian he 2024-12-09 16:10:15 Re: NOT ENFORCED constraint feature