Quick Links

Re: cannot abort transaction 2737414167, it was already committed

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: cannot abort transaction 2737414167, it was already committed
Date:	2023-12-27 22:55:34
Message-ID:	ZYyrZg-Lzoy9w3Fp@pryzbyj2023
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Dec 28, 2023 at 11:33:16AM +1300, Thomas Munro wrote:
> I guess the large object usage isn't directly relevant (that module's
> EOXact stuff seems to be finished before TRANS_COMMIT, but I don't
> know that code well). Everything later is supposed to be about
> closing/releasing/cleaning up, and for example smgrDoPendingDeletes()
> reaches code with this relevant comment:
>
> * Note: smgr_unlink must treat deletion failure as a WARNING, not an
> * ERROR, because we've already decided to commit or abort the current
> * xact.
>
> We don't really have a general ban on ereporting on system call
> failure, though. We've just singled unlink() out. Only a few lines
> above that we call DropRelationsAllBuffers(rels, nrels), and that
> calls smgrnblocks(), and that might need to need to re-open() the
> relation file to do lseek(SEEK_END), because PostgreSQL itself has no
> tracking of relation size. Hard to say but my best guess is that's
> where you might have got your EIO, assuming you dropped the relation
> in this transaction?

Yeah. In fact I was confused - this was not lo_unlink().
This uses normal tables, so would've done:

"begin;"
"DROP TABLE IF EXISTS %s", tablename
"DELETE FROM cached_objects WHERE cache_name=%s", tablename
"commit;"

> > This is pg16 compiled at efa8f6064, runing under centos7. ZFS is 2.2.2,
> > but the pool hasn't been upgraded to use the features new since 2.1.
>
> I've been following recent ZFS stuff from a safe distance as a user.
> AFAIK the extremely hard to hit bug fixed in that very recent release
> didn't technically require the interesting new feature (namely block
> cloning, though I think that helped people find the root cause after a
> phase of false blame?). Anyway, it had for symptom some bogus zero
> bytes on read, not a spurious EIO.

The ZFS bug had to do with bogus bytes which may-or-may-not-be-zero, as
I understand. The understanding is that the bug was pre-existing but
became more easy to hit in 2.2, and is fixed in 2.2.2 and 2.1.14.

--
Justin

In response to

Re: cannot abort transaction 2737414167, it was already committed at 2023-12-27 22:33:16 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2023-12-28 01:26:35	Re: Built-in CTYPE provider
Previous Message	Tom Lane	2023-12-27 22:42:09	Re: cannot abort transaction 2737414167, it was already committed