Re: backend stuck in DataFileExtend

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: backend stuck in DataFileExtend
Date: 2024-05-06 22:55:28
Message-ID: CA+hUKG+QczhLLktfiXd9a-OMDRLnqAaz8g6JJGRBnLszrh5Fog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 7, 2024 at 6:21 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> FWIW: both are running zfs-2.2.3 RPMs from zfsonlinux.org.
...
> Yes, they're running centos7 with the indicated kernels.

So far we've got:

* spurious EIO when opening a file (your previous report)
* hanging with CPU spinning (?) inside pwritev()
* old kernel, bleeding edge ZFS

From an (uninformed) peek at the ZFS code, if it really is spinning
there is seems like a pretty low level problem: it's finish the write,
and now is just trying to release (something like our unpin) and
unlock the buffers, which involves various code paths that might touch
various mutexes and spinlocks, and to get stuck like that I guess it's
either corrupted itself or it is deadlocking against something else,
but what? Do you see any other processes (including kernel threads)
with any stuck stacks that might be a deadlock partner?

While looking around for reported issues I found your abandoned report
against an older ZFS version from a few years ago, same old Linux
version:

https://github.com/openzfs/zfs/issues/11641

I don't know enough to say anything useful about that but it certainly
smells similar...

I see you've been busy reporting lots of issues, which seems to
involve big data, big "recordsize" (= ZFS block sizes), compression
and PostgreSQL:

https://github.com/openzfs/zfs/issues?q=is%3Aissue+author%3Ajustinpryzby

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-05-06 22:58:15 Re: Incorrect explain output for updates/delete operations with returning-list on partitioned tables
Previous Message SAIKIRAN AVULA 2024-05-06 22:10:33 Skip adding row-marks for non target tables when result relation is foreign table.