From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: Assertion failure in SnapBuildInitialSnapshot() |
Date: | 2024-06-11 19:00:01 |
Message-ID: | b91cf8ef-b5af-5def-ff05-bd67336ef907@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
01.02.2024 21:20, vignesh C wrote:
> The patch which you submitted has been awaiting your attention for
> quite some time now. As such, we have moved it to "Returned with
> Feedback" and removed it from the reviewing queue. Depending on
> timing, this may be reversible. Kindly address the feedback you have
> received, and resubmit the patch to the next CommitFest.
While analyzing buildfarm failures, I found [1], which demonstrates the
assertion failure discussed here:
---
031_column_list_publisher.log
TRAP: FailedAssertion("TransactionIdPrecedesOrEquals(safeXid, snap->xmin)", File:
"/home/bf/bf-build/skink/REL_15_STABLE/pgsql.build/../pgsql/src/backend/replication/logical/snapbuild.c", Line: 614,
PID: 1882382)
---
I've managed to reproduce the assertion failure on REL_15_STABLE with the
following modification:
@@ -3928,6 +3928,7 @@ ProcArraySetReplicationSlotXmin(TransactionId xmin, TransactionId catalog_xmin,
{
Assert(!already_locked || LWLockHeldByMe(ProcArrayLock));
+pg_usleep(1000);
if (!already_locked)
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
using the script:
numjobs=100
createdb db
export PGDATABASE=db
for ((i=1;i<=100;i++)); do
echo "iteration $i"
for ((j=1;j<=numjobs;j++)); do
echo "
SELECT pg_create_logical_replication_slot('s$j', 'test_decoding');
SELECT txid_current();
" | psql >>/dev/null 2>&1 &
echo "
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
CREATE_REPLICATION_SLOT slot$j LOGICAL test_decoding USE_SNAPSHOT;
" | psql -d "dbname=db replication=database" >>/dev/null 2>&1 &
done
wait
for ((j=1;j<=numjobs;j++)); do
echo "
DROP_REPLICATION_SLOT slot$j;
" | psql -d "dbname=db replication=database" >/dev/null
echo "SELECT pg_drop_replication_slot('s$j');" | psql >/dev/null
done
grep 'TRAP' server.log && break;
done
(with
wal_level = logical
max_replication_slots = 200
max_wal_senders = 200
in postgresql.conf)
iteration 18
ERROR: replication slot "slot13" is active for PID 538431
TRAP: FailedAssertion("TransactionIdPrecedesOrEquals(safeXid, snap->xmin)", File: "snapbuild.c", Line: 614, PID: 538431)
I've also confirmed that fix_concurrent_slot_xmin_update.patch fixes the
issue.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-05-15%2020%3A55%3A17
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-06-11 19:14:49 | Re: confirmed flush lsn seems to be move backward in certain error cases |
Previous Message | Robert Haas | 2024-06-11 18:48:30 | Re: Track the amount of time waiting due to cost_delay |