Re: Potential hot-standby bug around xacts committed but in xl_running_xacts

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Subject: Re: Potential hot-standby bug around xacts committed but in xl_running_xacts
Date: 2017-05-02 05:12:41
Message-ID: CANP8+jLtrRs951zBgjoOp3r=HHKJpPvL0=WF4KUteec-9NQSjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 May 2017 at 22:38, Andres Freund <andres(at)anarazel(dot)de> wrote:

> The thread below http://archives.postgresql.org/message-id/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
> describes an issue in logical decoding that arises because
> xl_running_xacts' contents aren't necessarily coherent with the contents
> of the WAL, because RecordTransactionCommit() / RecordTransactionAbort()
> don't have any interlock against the procarray. That means
> xl_running_xacts can contain transactions assumed to be running, that
> already have their commit/abort records WAL logged.

That is known/by design.

> I think that's not just problematic in logical decoding, but also
> Hot-Standby. Consider the following:
>
> ProcArrayApplyRecoveryInfo() gets an xl_running_xacts record that's not
> suboverflowed, and thus will change to STANDBY_SNAPSHOT_READY.

Yes

> In that
> case it'll populate the KnownAssignedXids machinery using
> KnownAssignedXidsAdd().

No, because of this section of code in ProcArrayApplyRecoveryInfo()

/*
* The running-xacts snapshot can contain xids that were still visible
* in the procarray when the snapshot was taken, but were already
* WAL-logged as completed. They're not running anymore, so ignore
* them.
*/
if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
continue;

> Am I missing something that protects against the above scenario?

It does seem so, yes. The locking issues in Hot Standby are complex,
but they do seem to be correct, thanks for reviewing them.

This is documented in multiple locations, including what I thought was
your own comment in LogStandbySnapshot().

What I suggest is that with logical decoding in mind we do this
1. Inject a new record XLOG_SNAPSHOT_START at the start of
LogStandbySnapshot(). We start logical decoding from there.
2. Record any transactions that end
3. Now the full XLOG_RUNNING_XACTS record arrives. We apply all xacts
that are seen as running, minus any ended between 1 and 3

This avoids the problems for the race but without holding locks while
we log XLOG_RUNNING_XACTS, something that was considered painful for
Hot Standby.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-05-02 06:23:26 Re: CTE inlining
Previous Message Noah Misch 2017-05-02 04:28:06 Re: Transition tables for triggers on foreign tables and views