Re: FileFallocate misbehaving on XFS

From: Michael Harris <harmic(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FileFallocate misbehaving on XFS
Date: 2024-12-10 06:34:23
Message-ID: CADofcAUOqdrEhZj6-3h3GKz2k7J1pJe4pQ0W-PEibOj2=vrScA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi again

One extra piece of information: I had said that all the machines were
Rocky Linux 8 or Rocky Linux 9, but actually a large number of them
are RHEL8.

Sorry for the confusion.

Of course RL8 is a rebuild of RHEL8 so it is not surprising they would
be behaving similarly.

Cheers
Mike

On Tue, 10 Dec 2024 at 17:28, Michael Harris <harmic(at)gmail(dot)com> wrote:
>
> Hi Andres
>
> Following up on the earlier question about OS upgrade paths - all the
> cases reported so far are either on RL8 (Kernel 4.18.0) or were
> upgraded to RL9 (kernel 5.14.0) and the affected filesystems were
> preserved.
> In fact the RL9 systems were initially built as Centos 7, and then
> when that went EOL they were upgraded to RL9. The process was as I
> described - the /var/opt filesystem which contained the database was
> preserved, and the root and other OS filesystems were scratched.
> The majority of systems where we have this problem are on RL8.
>
> On Tue, 10 Dec 2024 at 11:31, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Are you using any filesystem quotas?
>
> No.
>
> > It'd be useful to get the xfs_info output that Jakub asked for. Perhaps also
> > xfs_spaceman -c 'freesp -s' /mountpoint
> > xfs_spaceman -c 'health' /mountpoint
> > and df.
>
> I gathered this info from one of the systems that is currently on RL9.
> This system is relatively small compared to some of the others that
> have exhibited this issue, but it is the only one I can access right
> now.
>
> # uname -a
> Linux 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15
> 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
>
> # xfs_info /dev/mapper/ippvg-ipplv
> meta-data=/dev/mapper/ippvg-ipplv isize=512 agcount=4, agsize=262471424 blks
> = sectsz=512 attr=2, projid32bit=1
> = crc=1 finobt=0, sparse=0, rmapbt=0
> = reflink=0 bigtime=0 inobtcount=0 nrext64=0
> data = bsize=4096 blocks=1049885696, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=512639, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> # for agno in `seq 0 3`; do xfs_spaceman -c "freesp -s -a $agno" /var/opt; done
> from to extents blocks pct
> 1 1 37502 37502 0.15
> 2 3 62647 148377 0.59
> 4 7 87793 465950 1.85
> 8 15 135529 1527172 6.08
> 16 31 184811 3937459 15.67
> 32 63 165979 7330339 29.16
> 64 127 101674 8705691 34.64
> 128 255 15123 2674030 10.64
> 256 511 973 307655 1.22
> total free extents 792031
> total free blocks 25134175
> average free extent size 31.7338
> from to extents blocks pct
> 1 1 43895 43895 0.22
> 2 3 59312 141693 0.70
> 4 7 83406 443964 2.20
> 8 15 120804 1362108 6.75
> 16 31 133140 2824317 14.00
> 32 63 118619 5188474 25.71
> 64 127 77960 6751764 33.46
> 128 255 16383 2876626 14.26
> 256 511 1763 546506 2.71
> total free extents 655282
> total free blocks 20179347
> average free extent size 30.7949
> from to extents blocks pct
> 1 1 72034 72034 0.26
> 2 3 98158 232135 0.83
> 4 7 126228 666187 2.38
> 8 15 169602 1893007 6.77
> 16 31 180286 3818527 13.65
> 32 63 164529 7276833 26.01
> 64 127 109687 9505160 33.97
> 128 255 22113 3921162 14.02
> 256 511 1901 592052 2.12
> total free extents 944538
> total free blocks 27977097
> average free extent size 29.6199
> from to extents blocks pct
> 1 1 51462 51462 0.21
> 2 3 98993 233204 0.93
> 4 7 131578 697655 2.79
> 8 15 178151 1993062 7.97
> 16 31 175718 3680535 14.72
> 32 63 145310 6372468 25.48
> 64 127 89518 7749021 30.99
> 128 255 18926 3415768 13.66
> 256 511 2640 813586 3.25
> total free extents 892296
> total free blocks 25006761
> average free extent size 28.0252
>
> # xfs_spaceman -c 'health' /var/opt
> Health status has not been collected for this filesystem.
>
> > What kind of storage is this on?
>
> As mentioned, there are quite a few systems in different sites, so a
> number of different storage solutions in use, some with directly
> attached disks, others with some SAN solutions.
> The instance I got the printout above from is a VM, but in the other
> site they are all bare metal.
>
> > Was the filesystem ever grown from a smaller size?
>
> I can't say for sure that none of them were, but given the number of
> different systems that have this issue I am confident that would not
> be a common factor.
>
> > Have you checked the filesystem's internal consistency? I.e. something like
> > xfs_repair -n /dev/nvme2n1. It does require the filesystem to be read-only or
> > unmounted though. But corrupted filesystem datastructures certainly could
> > cause spurious ENOSPC.
>
> I executed this on the same system as the above prints came from. It
> did not report any issues.
>
> > Are you using pg_upgrade -j?
>
> Yes, we use -j `nproc`
>
> > I assume the file that actually errors out changes over time? It's always
> > fallocate() that fails?
>
> Yes, correct, on both counts.
>
> > Can you tell us anything about the workload / data? Lots of tiny tables, lots
> > of big tables, write heavy, ...?
>
> It is a write heavy application which stores mostly time series data.
>
> The time series data is partitioned by time; the application writes
> constantly into the 'current' partition, and data is expired by
> removing the oldest partition. Most of the data is written once and
> not updated.
>
> There are quite a lot of these partitioned tables (in the 1000's or
> 10000's) depending on how the application is configured. Individual
> partitions range in size from a few MB to 10s of GB.
>
> Cheers
> Mike.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-10 06:41:02 Re: psql client does not handle WSAEWOULDBLOCK on Windows
Previous Message Michael Paquier 2024-12-10 06:30:41 Re: sslinfo extension - add notbefore and notafter timestamps