From: | Michael Harris <harmic(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: FileFallocate misbehaving on XFS |
Date: | 2024-12-19 06:47:13 |
Message-ID: | CADofcAXMX3OuPfbOU98v+nqGRxVWyUB+KrLs3LhPojgxTAntog@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
I finally managed to get the patched version installed in a production
database where the error is occurring very regularly.
Here is a sample of the output:
2024-12-19 01:08:50 CET [2533222]: LOG: mdzeroextend FileFallocate
failing with ENOSPC: free space for filesystem containing
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" f_blocks:
2683831808, f_bfree: 205006167, f_bavail: 205006167 f_files:
1073741376, f_ffree: 1069933796
2024-12-19 01:08:50 CET [2533222]: ERROR: could not extend file
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" by 13 blocks,
from 110869 to 110882, using FileFallocate(): No space left on device
2024-12-19 01:08:51 CET [2533246]: LOG: mdzeroextend FileFallocate
failing with ENOSPC: free space for filesystem containing
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" f_blocks:
2683831808, f_bfree: 205004945, f_bavail: 205004945 f_files:
1073741376, f_ffree: 1069933796
2024-12-19 01:08:51 CET [2533246]: ERROR: could not extend file
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" by 14 blocks,
from 110965 to 110979, using FileFallocate(): No space left on device
2024-12-19 01:08:59 CET [2531320]: LOG: mdzeroextend FileFallocate
failing with ENOSPC: free space for filesystem containing
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" f_blocks:
2683831808, f_bfree: 204980672, f_bavail: 204980672 f_files:
1073741376, f_ffree: 1069933795
2024-12-19 01:08:59 CET [2531320]: ERROR: could not extend file
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" by 14 blocks,
from 111745 to 111759, using FileFallocate(): No space left on device
2024-12-19 01:09:01 CET [2531331]: LOG: mdzeroextend FileFallocate
failing with ENOSPC: free space for filesystem containing
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" f_blocks:
2683831808, f_bfree: 204970783, f_bavail: 204970783 f_files:
1073741376, f_ffree: 1069933795
2024-12-19 01:09:01 CET [2531331]: ERROR: could not extend file
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" by 12 blocks,
from 112045 to 112057, using FileFallocate(): No space left on device
I have attached a file containing all the errors I collected. The
error is happening pretty regularly - over 400 times in a ~6 hour
period. The number of blocks being extended varies from ~9 to ~15, and
the statfs result shows plenty of available space & inodes at the
time. The errors do seem to come in bursts.
This is a different system to those I previously provided logs from.
It is also running RHEL8 with a similar configuration to the other
system.
I have so far not installed the bpftrace that Jakub suggested before -
as I say this is a production machine and I am wary of triggering a
kernel panic or worse (even though it seems like the risk for that
would be low?). While a kernel stack trace would no doubt be helpful
to the XFS developers, from a postgres point of view, would that be
likely to help us decide what to do about this?
Cheers
Mike
On Tue, 17 Dec 2024 at 10:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Dec 16, 2024 at 12:52 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I don't see what we gain by requiring guesswork (what does allocating vs
> > zeroing mean, zeroing also allocates disk space after all) to interpret the
> > main error message. My experience is that it's often harder to get the DETAIL
> > than the actual error message (grepping becomes harder due to separate line,
> > terse verbosity is commonly used).
>
> I feel like the normal way that we do this is basically:
>
> could not {name of system call} file "\%s\": %m
>
> e.g.
>
> could not read file \"%s\": %m
>
> I don't know why we should do anything else in this type of case.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
rhel8_fallocate_extended.log | application/octet-stream | 191.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-12-19 06:48:05 | Re: New "single" COPY format |
Previous Message | David Rowley | 2024-12-19 06:33:20 | Re: Converting SetOp to read its two inputs separately |