Quick Links

Re: snapbuild woes

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: snapbuild woes
Date:	2016-12-12 23:38:26
Message-ID:	7023017e-8b08-eaba-396c-80baf0a793c0@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 12/12/16 23:33, Andres Freund wrote:
> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>> On 12/12/16 22:42, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
>>>> Hi,
>>>> First one is outright bug, which has to do with how we track running
>>>> transactions. What snapbuild basically does while doing initial snapshot
>>>> is read the xl_running_xacts record, store the list of running txes and
>>>> then wait until they all finish. The problem with this is that
>>>> xl_running_xacts does not ensure that it only logs transactions that are
>>>> actually still running (to avoid locking PGPROC) so there might be xids
>>>> in xl_running_xacts that already committed before it was logged.
>>>
>>> I don't think that's actually true? Notice how LogStandbySnapshot()
>>> only releases the lock *after* the LogCurrentRunningXacts() iff
>>> wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
>>> observed must actually be a bit more complex :(
>>>
>>
>> Hmm, interesting, I did see the transaction commit in the WAL before the
>> xl_running_xacts that contained the xid as running. I only seen it on
>> production system though, didn't really manage to easily reproduce it
>> locally.
>
> I suspect the reason for that is that RecordTransactionCommit() doesn't
> conflict with ProcArrayLock in the first place - only
> ProcArrayEndTransaction() does. So they're still running in the PGPROC
> sense, just not the crash-recovery sense...
>

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Also if we did the approach suggested by my patch (ie using this
xmin/xmax comparison) I guess we wouldn't need to hold the lock for
extra time in wal_level logical anymore.

That is of course unless you think it should be approached from the
other side of the stream and try log correct xl_running_xacts.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: snapbuild woes at 2016-12-12 22:33:38 from Andres Freund

Responses

Re: snapbuild woes at 2017-02-22 02:05:42 from Petr Jelinek

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2016-12-12 23:39:13	Re: pgsql: Add support for temporary replication slots
Previous Message	Thomas Munro	2016-12-12 22:46:35	Re: [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation