Re: FileFallocate misbehaving on XFS

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Michael Harris <harmic(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FileFallocate misbehaving on XFS
Date: 2024-12-20 12:25:41
Message-ID: CAKZiRmzWbo_Xcv00_LC-T0xFYwJ3UFJdra7N3G1K3bqCac0qSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 19, 2024 at 7:49 AM Michael Harris <harmic(at)gmail(dot)com> wrote:

> Hello,
>
> I finally managed to get the patched version installed in a production
> database where the error is occurring very regularly.
>
> Here is a sample of the output:
>
> 2024-12-19 01:08:50 CET [2533222]: LOG: mdzeroextend FileFallocate
> failing with ENOSPC: free space for filesystem containing
> "pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" f_blocks:
> 2683831808, f_bfree: 205006167, f_bavail: 205006167 f_files:
> 1073741376, f_ffree: 1069933796

[..]

> I have attached a file containing all the errors I collected. The
> error is happening pretty regularly - over 400 times in a ~6 hour
> period. The number of blocks being extended varies from ~9 to ~15, and
> the statfs result shows plenty of available space & inodes at the
> time. The errors do seem to come in bursts.
>

I couldn't resist: you seem to have entered the quantum realm of free disk
space AKA Schrodinger's free space: you both have the space and dont have
it... ;)

No one else has responded, so I'll try. My take is that we got very limited
number of reports (2-3) of this stuff happening and it always seem to be
>90% space used, yet the adoption of PG16 is rising, so we may or may not
see more errors of those kind, but on another side of things: it's
frequency is so rare that it's really wild we don't see more reports like
this one. Lots of OS upgrades in the wild are performed by building new
standbys (maybe that lowers the fs fragmentation), rather than in-place OS
upgrades. To me it sounds like a new bug in XFS that is rare. You can
probably live with #undef HAVE_POSIX_FALLOCATE as a way to survive, another
would be probably to try to run xfs_fsr to defragment the fs.

Longer-term: other than collecting the eBPF data to start digging from
where it is really triggered, I don't see a way forward. It would be
suboptimal to just abandon fallocate() optimizations from commit
31966b151e6ab7a6284deab6e8fe5faddaf2ae4c, just because of very unusual
combinations of factors (XFS bug).

Well we could be having some kludge like pseudo-code: if(posix_falloc() ==
ENOSPC && statfs().free_space_pct >= 1) fallback_to_pwrites(), but it is
ugly. Another is GUC (or even two -- how much to extend or to use or not
the posix_fallocate()), but people do not like more GUCs...

> I have so far not installed the bpftrace that Jakub suggested before -
> as I say this is a production machine and I am wary of triggering a
> kernel panic or worse (even though it seems like the risk for that
> would be low?). While a kernel stack trace would no doubt be helpful
> to the XFS developers, from a postgres point of view, would that be
> likely to help us decide what to do about this?[..]

Well you could try having reproduction outside of production, or even clone
the storage (but not using backup/restore), but literally clone the XFS
LUNs on the storage itself and connect those separate VM to have a safe
testbed (or even use dd(1) of some smaller XFS fs exhibiting such behaviour
to some other place)

As for eBPF/bpftrace: it is safe (it's sandboxed anyway), lots of customers
are using it, but as always YMMV.

There's also xfs_fsr that might help overcome.

You can also experiment if -o allocsize helps or just even try -o
allocsize=0 (but that might have some negative effects on performance
probably)

-J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-12-20 12:53:20 Re: proposal: schema variables
Previous Message Peter Eisentraut 2024-12-20 11:47:31 downgrade some aclchk.c errors to internal