Re: cannot abort transaction 2737414167, it was already committed

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: cannot abort transaction 2737414167, it was already committed
Date: 2023-12-27 22:55:34
Message-ID: ZYyrZg-Lzoy9w3Fp@pryzbyj2023
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 28, 2023 at 11:33:16AM +1300, Thomas Munro wrote:
> I guess the large object usage isn't directly relevant (that module's
> EOXact stuff seems to be finished before TRANS_COMMIT, but I don't
> know that code well). Everything later is supposed to be about
> closing/releasing/cleaning up, and for example smgrDoPendingDeletes()
> reaches code with this relevant comment:
>
> * Note: smgr_unlink must treat deletion failure as a WARNING, not an
> * ERROR, because we've already decided to commit or abort the current
> * xact.
>
> We don't really have a general ban on ereporting on system call
> failure, though. We've just singled unlink() out. Only a few lines
> above that we call DropRelationsAllBuffers(rels, nrels), and that
> calls smgrnblocks(), and that might need to need to re-open() the
> relation file to do lseek(SEEK_END), because PostgreSQL itself has no
> tracking of relation size. Hard to say but my best guess is that's
> where you might have got your EIO, assuming you dropped the relation
> in this transaction?

Yeah. In fact I was confused - this was not lo_unlink().
This uses normal tables, so would've done:

"begin;"
"DROP TABLE IF EXISTS %s", tablename
"DELETE FROM cached_objects WHERE cache_name=%s", tablename
"commit;"

> > This is pg16 compiled at efa8f6064, runing under centos7. ZFS is 2.2.2,
> > but the pool hasn't been upgraded to use the features new since 2.1.
>
> I've been following recent ZFS stuff from a safe distance as a user.
> AFAIK the extremely hard to hit bug fixed in that very recent release
> didn't technically require the interesting new feature (namely block
> cloning, though I think that helped people find the root cause after a
> phase of false blame?). Anyway, it had for symptom some bogus zero
> bytes on read, not a spurious EIO.

The ZFS bug had to do with bogus bytes which may-or-may-not-be-zero, as
I understand. The understanding is that the bug was pre-existing but
became more easy to hit in 2.2, and is fixed in 2.2.2 and 2.1.14.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-12-28 01:26:35 Re: Built-in CTYPE provider
Previous Message Tom Lane 2023-12-27 22:42:09 Re: cannot abort transaction 2737414167, it was already committed