cannot abort transaction 2737414167, it was already committed

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: cannot abort transaction 2737414167, it was already committed
Date: 2023-12-27 15:02:25
Message-ID: ZYw8gVOMF9gfp6i5@pryzbyj2023
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We had this:

< 2023-12-25 04:06:20.062 MST telsasoft >ERROR: could not open file "pg_tblspc/16395/PG_16_202307071/16384/121010871": Input/output error
< 2023-12-25 04:06:20.062 MST telsasoft >STATEMENT: commit
< 2023-12-25 04:06:20.062 MST telsasoft >WARNING: AbortTransaction while in COMMIT state
< 2023-12-25 04:06:20.062 MST telsasoft >PANIC: cannot abort transaction 2737414167, it was already committed
< 2023-12-25 04:06:20.473 MST >LOG: server process (PID 14678) was terminated by signal 6: Aborted

The application is a daily cronjob which would've just done:

begin;
lo_unlink(); -- the client-side function called from pygresql;
DELETE FROM tbl WHERE col=%s;
commit;

The table being removed would've been a transient (but not "temporary")
table created ~1 day prior.

It's possible that the filesystem had an IO error, but I can't find any
evidence of that. Postgres is running entirely on zfs, which says:

scan: scrub repaired 0B in 00:07:03 with 0 errors on Mon Dec 25 04:49:07 2023
errors: No known data errors

My main question is why an IO error would cause the DB to abort, rather
than raising an ERROR.

This is pg16 compiled at efa8f6064, runing under centos7. ZFS is 2.2.2,
but the pool hasn't been upgraded to use the features new since 2.1.

(gdb) bt
#0 0x00007fc961089387 in raise () from /lib64/libc.so.6
#1 0x00007fc96108aa78 in abort () from /lib64/libc.so.6
#2 0x00000000009438b7 in errfinish (filename=filename(at)entry=0xac8e20 "xact.c", lineno=lineno(at)entry=1742, funcname=funcname(at)entry=0x9a6600 <__func__.32495> "RecordTransactionAbort") at elog.c:604
#3 0x000000000054d6ab in RecordTransactionAbort (isSubXact=isSubXact(at)entry=false) at xact.c:1741
#4 0x000000000054d7bd in AbortTransaction () at xact.c:2814
#5 0x000000000054e015 in AbortCurrentTransaction () at xact.c:3415
#6 0x0000000000804e4e in PostgresMain (dbname=0x12ea840 "ts", username=0x12ea828 "telsasoft") at postgres.c:4354
#7 0x000000000077bdd6 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4465
#8 BackendStartup (port=0x12e44c0) at postmaster.c:4193
#9 ServerLoop () at postmaster.c:1783
#10 0x000000000077ce9a in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x12ad280) at postmaster.c:1467
#11 0x00000000004ba8b8 in main (argc=3, argv=0x12ad280) at main.c:198

#3 0x000000000054d6ab in RecordTransactionAbort (isSubXact=isSubXact(at)entry=false) at xact.c:1741
xid = 2737414167
rels = 0x94f549 <hash_seq_init+73>
ndroppedstats = 0
droppedstats = 0x0

#4 0x000000000054d7bd in AbortTransaction () at xact.c:2814
is_parallel_worker = false

--
Justin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-12-27 15:05:28 Re: Should we remove -Wdeclaration-after-statement?
Previous Message Robert Haas 2023-12-27 14:11:02 Re: trying again to get incremental backup