From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | out-of-order XID insertion in KnownAssignedXids |
Date: | 2018-10-05 07:06:45 |
Message-ID: | 0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
Looks like there is a bug with logging running transactions XIDs and
prepared transactions.
One of our customers get error "FATAL: out-of-order XID insertion in
KnownAssignedXids"
trying to apply backup.
WAL contains the following record:
rmgr: Standby len (rec/tot): 98/ 98, tx: 0, lsn:
1418/A9A76C90, prev 1418/A9A76C48, desc: RUNNING_XACTS nextXid
2004495309 latestCompletedXid 2004495307 oldestRunningXid 2004495290; 3
xacts: 2004495290 2004495308 2004495308
As you can notice, XID 2004495308 is encountered twice which cause error
in KnownAssignedXidsAdd:
if (head > tail &&
TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
from_xid))
{
KnownAssignedXidsDisplay(LOG);
elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
}
The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any transaction.
MarkAsPrepared will add GXACT to proc array and at this moment there
will be two entries in procarray with the same XID:
(gdb) p procArray->numProcs
$2 = 4
(gdb) p allPgXact[procArray->pgprocnos[0]]
$4 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0
'\000', delayChkpt = 1 '\001', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[1]]
$5 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000',
delayChkpt = 0 '\000', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[2]]
$6 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000',
delayChkpt = 0 '\000', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[3]]
$7 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0
'\000', delayChkpt = 0 '\000', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}
Then you should just wait for sometime until checkpoint timeout is
triggered and it logs snapshot:
(gdb) bt
#0 0x00000000007f3dab in GetRunningTransactionData () at procarray.c:2240
#1 0x00000000007fab22 in LogStandbySnapshot () at standby.c:943
#2 0x000000000077cde8 in BackgroundWriterMain () at bgwriter.c:331
#3 0x00000000005377f3 in AuxiliaryProcessMain (argc=2,
argv=0x7ffe00aa00e0) at bootstrap.c:446
#4 0x000000000078e07e in StartChildProcess (type=BgWriterProcess) at
postmaster.c:5323
#5 0x000000000078b6f0 in reaper (postgres_signal_arg=17) at
postmaster.c:2948
#6 <signal handler called>
#7 0x00007f1356d665b3 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#8 0x0000000000789931 in ServerLoop () at postmaster.c:1765
#9 0x000000000078906c in PostmasterMain (argc=3, argv=0x1902640) at
postmaster.c:1406
#10 0x00000000006d0e4f in main (argc=3, argv=0x1902640) at main.c:228
Now generated RUNNING_XACTS record contains duplicated XIDs.
I want to ask opinion of community about the best way of fixing this
problem.
Should we avoid storing duplicated XIDs in procarray (by invalidating
XID in original pgaxct)
or eliminate/change check for duplicate in KnownAssignedXidsAdd (for
example just ignore duplicates)?
From | Date | Subject | |
---|---|---|---|
Next Message | Anushi Maheshwari | 2018-10-05 07:30:23 | GCI Mentor |
Previous Message | Michael Paquier | 2018-10-05 07:05:18 | Re: pg_upgrade failed with ERROR: null relpartbound for relation 18159 error. |