From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> |
Cc: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Subject: | Re: [HACKERS] Issues with logical replication |
Date: | 2017-11-15 20:09:11 |
Message-ID: | CA+TgmoaL-awk7Es2eXL=wxrpQqv+yxM_7FJe2ttx+VpRHKSbbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 9, 2017 at 9:19 PM, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
> I investigated this case and it seems that XactLockTableWait() in SnapBuildWaitSnapshot()
> not always work as expected. XactLockTableWait() waits on LockAcquire() for xid to be
> completed and if we finally got this lock but transaction is still in progress then such xid
> assumed to be a subxid. However LockAcquire/LockRelease cycle can happen after transaction
> set xid, but before XactLockTableInsert().
>
> Namely following history happened for xid 5225 and lead to crash:
>
> [backend] LOG: AssignTransactionId: XactTopTransactionId = 5225
> [walsender] LOG: LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, xid[0]=5225
> [walsender] LOG: XactLockTableWait: LockAcquire 5225
> [walsender] LOG: XactLockTableWait: LockRelease 5225
> [backend] LOG: AssignTransactionId: LockAcquire ExclusiveLock 5225
> [walsender] LOG: TransactionIdIsInProgress: SVC->latestCompletedXid=5224 < xid=5225 => true
> [backend] LOG: CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
> [backend] LOG: CommitTransaction: ResourceOwnerRelease locks xid=5225
Ouch. This seems like a bug that needs to be fixed, but do you think
it's related to to Petr's proposed fix to set es_output_cid? That fix
looks reasonable, since we shouldn't try to lock tuples without a
valid CommandId.
Now, having said that, I understand how the lack of that fix could cause:
2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple
But I do not understand how it could cause:
#3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-11-15 20:13:38 | Re: [HACKERS] taking stdbool.h into use |
Previous Message | Mark Dilger | 2017-11-15 20:02:16 | Re: Updated macOS start scripts |