Race condition in TransactionIdIsInProgress

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Race condition in TransactionIdIsInProgress
Date: 2022-02-10 16:46:43
Message-ID: 4da7913d-398c-e2ad-d777-f752cf7f0bbb@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Postgres first records state transaction in CLOG, then removes
transaction from procarray and finally releases locks.
But it can happen that transaction is marked as committed in CLOG,
XMAX_COMMITTED bit is set in modified tuple but
TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
uncommitted in tuple (%u,%u) to be updated in table"
error is reported.

TransactionIdIsInProgress first checks for cached XID status:

    /*
     * We may have just checked the status of this transaction, so if it is
     * already known to be completed, we can fall out without any access to
     * shared memory.
     */
    if (TransactionIdIsKnownCompleted(xid))
    {
        xc_by_known_xact_inc();
        return false;
    }

This status cache is updated by TransactionLogFetch.
So once transaction status is fetched from CLOG, subsequent invocation
of TransactionIdIsKnownCompleted will return true
even if transaction is not removed from procarray.

The following stack illustrates how it can happen:

#6  0x000055ab7758bbaf in TransactionLogFetch (transactionId=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:88
#7  TransactionLogFetch (transactionId=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:52
#8  0x000055ab7758bc7d in TransactionIdDidAbort
(transactionId=transactionId(at)entry=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:186
#9  0x000055ab77812a43 in TransactionIdIsInProgress (xid=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/storage/ipc/procarray.c:1587
#10 0x000055ab7753de79 in HeapTupleSatisfiesVacuumHorizon
(htup=htup(at)entry=0x7ffe562cec50, buffer=buffer(at)entry=1817,
    dead_after=dead_after(at)entry=0x7ffe562ceb14)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1281
#11 0x000055ab7753e2c5 in HeapTupleSatisfiesNonVacuumable (buffer=1817,
snapshot=0x7ffe562cec70, htup=0x7ffe562cec50)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1449
#12 HeapTupleSatisfiesVisibility (tup=tup(at)entry=0x7ffe562cec50,
snapshot=snapshot(at)entry=0x7ffe562cec70, buffer=buffer(at)entry=1817)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1804
#13 0x000055ab7752f3b5 in heap_hot_search_buffer
(tid=tid(at)entry=0x7ffe562cec2a, relation=relation(at)entry=0x7fa57fbf17c8,
buffer=buffer(at)entry=1817,
    snapshot=snapshot(at)entry=0x7ffe562cec70,
heapTuple=heapTuple(at)entry=0x7ffe562cec50, all_dead=all_dead(at)entry=0x0,
first_call=true)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:1802
#14 0x000055ab775370d8 in heap_index_delete_tuples (rel=0x7fa57fbf17c8,
delstate=<optimized out>)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:7480
#15 0x000055ab7755767a in table_index_delete_tuples
(delstate=0x7ffe562d0160, rel=0x2)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/include/access/tableam.h:1327
#16 _bt_delitems_delete_check (rel=rel(at)entry=0x7fa57fbf4518,
buf=buf(at)entry=1916, heapRel=heapRel(at)entry=0x7fa57fbf17c8,
    delstate=delstate(at)entry=0x7ffe562d0160)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtpage.c:1541
#17 0x000055ab7754e475 in _bt_bottomupdel_pass
(rel=rel(at)entry=0x7fa57fbf4518, buf=buf(at)entry=1916,
heapRel=heapRel(at)entry=0x7fa57fbf17c8, newitemsz=20)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtdedup.c:406
#18 0x000055ab7755010a in _bt_delete_or_dedup_one_page
(rel=0x7fa57fbf4518, heapRel=0x7fa57fbf17c8, insertstate=0x7ffe562d0640,
--Type <RET> for more, q to quit, c to continue without paging--
    simpleonly=<optimized out>, checkingunique=<optimized out>,
uniquedup=<optimized out>, indexUnchanged=true)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:2763
#19 0x000055ab775548c3 in _bt_findinsertloc (heapRel=0x7fa57fbf17c8,
stack=0x55ab7a1cde08, indexUnchanged=true, checkingunique=true,
    insertstate=0x7ffe562d0640, rel=0x7fa57fbf4518)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:904
#20 _bt_doinsert (rel=rel(at)entry=0x7fa57fbf4518,
itup=itup(at)entry=0x55ab7a1cde30,
checkUnique=checkUnique(at)entry=UNIQUE_CHECK_YES,
    indexUnchanged=indexUnchanged(at)entry=true,
heapRel=heapRel(at)entry=0x7fa57fbf17c8)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:255
#21 0x000055ab7755a931 in btinsert (rel=0x7fa57fbf4518,
values=<optimized out>, isnull=<optimized out>, ht_ctid=0x55ab7a1cd068,
heapRel=0x7fa57fbf17c8,
    checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true,
indexInfo=0x55ab7a1cdbd0)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtree.c:199
#22 0x000055ab7769ae59 in ExecInsertIndexTuples
(resultRelInfo=resultRelInfo(at)entry=0x55ab7a1c88b8,
slot=slot(at)entry=0x55ab7a1cd038,
    estate=estate(at)entry=0x55ab7a1c8430, update=update(at)entry=true,
noDupErr=noDupErr(at)entry=false, specConflict=specConflict(at)entry=0x0,
    arbiterIndexes=0x0) at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/executor/execIndexing.c:415

heap_lock_tuple will not wait transaction completion if
HEAP_XMAX_COMMITTED is set in tuple and returns TM_Updated:

        /*
         * Time to sleep on the other transaction/multixact, if necessary.
         *
         * If the other transaction is an update/delete that's already
         * committed, then sleeping cannot possibly do any good: if we're
         * required to sleep, get out to raise an error instead.
...

The problem is not so easy to reproduce. I got it when performe commits
of transaction with large number of subtransactions with several
synchronous replicas.
But it is possible to artificially introduce delay between
RecordTransactionCommit() and ProcArrayEndTransaction():

diff --git a/src/backend/access/transam/xact.c
b/src/backend/access/transam/xact.c
index ca6f6d57d3..d095fd13b0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2187,6 +2187,7 @@ CommitTransaction(void)
                 * durably commit.
                 */
                latestXid = RecordTransactionCommit();
+               pg_usleep(random() % 100000);
        }
        else
        {

Then the problem can be reproduced with the following pgbebch script:

update.sql:
\set aid random(1, 1000)
select do_update(:aid,100)

update function:
create or replace function do_update(id integer, level integer) returns
void as $$
begin
    begin
        if level > 0 then
            perform do_update(id, level-1);
        else
            update pgbench_accounts SET abalance = abalance + 1 WHERE
aid = id;
        end if;
    exception WHEN OTHERS THEN
        raise notice '% %', SQLERRM, SQLSTATE;
    end;
end; $$ language plpgsql;

pgbench -i postgres
pgbench  -f update.sql -c 100 -T 100 -P 1 postgres

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-02-10 16:51:08 Re: can we add subscription TAP test option "vcregress subscriptioncheck" for MSVC builds?
Previous Message Peter Geoghegan 2022-02-10 16:25:31 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats