Re: Logical replication - TRAP: FailedAssertion in pgstat.c

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical replication - TRAP: FailedAssertion in pgstat.c
Date: 2017-05-09 00:32:01
Message-ID: CAD21AoB_p+okFK_tROGxG-P1xfSN2TVxwPZoY8gf9BXcahq-WQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 9, 2017 at 1:26 AM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> On 08/05/17 17:52, Masahiko Sawada wrote:
>> On Fri, May 5, 2017 at 8:13 PM, Petr Jelinek
>> <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
>>> On 03/05/17 13:23, Erik Rijkers wrote:
>>>> On 2017-05-03 08:17, Petr Jelinek wrote:
>>>>> On 02/05/17 20:43, Robert Haas wrote:
>>>>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>>>>
>>>>>>> code path that calls CommitTransactionCommand() should have one, no?
>>>>>>
>>>>>> Is there anything left to be committed here?
>>>>>>
>>>>>
>>>>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>>>>> which didn't happen. I think something like attached should do the job.
>>>>
>>>> I'm running my pgbench-over-logical-replication test in chunk of 15
>>>> minutes, wth different pgbench -c (num clients) and -s (scale) values.
>>>>
>>>> With this patch (and nothing else) on top of master (8f8b9be51fd7 to be
>>>> precise):
>>>>
>>>>> fix-statistics-reporting-in-logical-replication-work.patch
>>>>
>>>> logical replication is still often failing (as expected, I suppose; it
>>>> seems because of "inital snapshot too large") but indeed I do not see
>>>
>>> Yes that's different thing that we've been discussing a bit in snapbuild
>>> woes thread.
>>>
>>>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>>>
>>>> (If there is any other configuration of patches worth testing please let
>>>> me know)
>>>>
>>>
>>> Thanks, so the patch works.
>>>
>>
>> I think that we should commit the local transaction that did initial
>> data copy, and then report stat as well. Currently table sync worker
>> doesn't commit the local transaction in LogicalRepSyncTableStart
>> (maybe until apply commit record?) if its status is changed to
>> SUBREL_STATE_CATCHUP. That's why the table sync worker issues
>> assertion failure.
>>
>
> That would fix the assert as well yes, but it would also mean that if
> the worker crashed between the initial copy and the end of catchup there
> would be no way to restart it without manual intervention from user
> since the synchronization position would be lost. Hence the fix I
> proposed which does it differently and has the whole sync in a single
> transaction.

I understood that the data synchronization even including apply
logical record after changed to SUBREL_STATE_CATCHUP should be done in
a single transaction. Thank you for explanation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-05-09 01:13:45 Re: pg_dump emits ALTER TABLE ONLY partitioned_table
Previous Message Erik Rijkers 2017-05-08 22:03:12 Re: snapbuild woes