Re: out-of-order XID insertion in KnownAssignedXids

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org,Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>,Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: out-of-order XID insertion in KnownAssignedXids
Date: 2018-10-08 15:24:25
Message-ID: 2A819787-9E67-41EF-B89A-A906996924E5@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>
>On 05.10.2018 11:04, Michael Paquier wrote:
>> On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:
>>> As you can notice, XID 2004495308 is encountered twice which cause
>error in
>>> KnownAssignedXidsAdd:
>>>
>>>     if (head > tail &&
>>>         TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
>from_xid))
>>>     {
>>>         KnownAssignedXidsDisplay(LOG);
>>>         elog(ERROR, "out-of-order XID insertion in
>KnownAssignedXids");
>>>     }
>>>
>>> The probability of this error is very small but it can quite easily
>>> reproduced: you should just set breakpoint in debugger after calling
>>> MarkAsPrepared in twophase.c and then try to prepare any
>transaction.
>>> MarkAsPrepared  will add GXACT to proc array and at this moment
>there will
>>> be two entries in procarray with the same XID:
>>>
>>> [snip]
>>>
>>> Now generated RUNNING_XACTS record contains duplicated XIDs.
>> So, I have been doing exactly that, and if you trigger a manual
>> checkpoint then things happen quite correctly if you let the first
>> session finish:
>> rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
>> 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
>> latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606
>>
>> If you still maintain the debugger after calling MarkAsPrepared, then
>> the manual checkpoint would block. Now if you actually keep the
>> debugger, and wait for a checkpoint timeout to happen, then I can see
>> the incorrect record. It is impressive that your customer has been
>able
>> to see that first, and then that you have been able to get into that
>> state with simple steps.
>>
>>> I want to ask opinion of community about the best way of fixing this
>>> problem. Should we avoid storing duplicated XIDs in procarray (by
>>> invalidating XID in original pgaxct) or eliminate/change check for
>>> duplicate in KnownAssignedXidsAdd (for example just ignore
>>> duplicates)?
>> Hmmmmm... Please let me think through that first. It seems to me
>that
>> the record should not be generated to begin with. At least I am able
>to
>> confirm what you see.
>
>The simplest way to fix the problem is to ignore duplicates before
>adding them to KnownAssignedXids.
>We in any case perform sort i this place...

I vehemently object to that as the proper course.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-10-08 15:28:05 Re: PostgreSQL 12, JIT defaults
Previous Message Bruce Momjian 2018-10-08 15:23:48 Re: SCRAM with channel binding downgrade attack