From: | Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> |
---|---|
To: | Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> |
Subject: | Re: Failed recovery with new faster 2PC code |
Date: | 2017-04-18 08:57:10 |
Message-ID: | CAMGcDxeykkrKCk0FY9Pzt5JusLWw4woKXs8NoqjbOZfQQZ-i2Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Please find attached a second version of my bug fix which is stylistically
better and clearer than the first one.
Regards,
Nikhils
On 18 April 2017 at 13:47, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> There was a bug in the redo 2PC remove code path. Because of which,
> autovac would think that the 2PC is gone and cause removal of the
> corresponding clog entry earlier than needed.
>
> Please find attached, the bug fix: 2pc_redo_remove_bug.patch.
>
> I have been testing this on top of Michael's 2pc-restore-fix.patch and
> things seem to be ok for the past one+ hour. Will keep it running for long.
>
> Jeff, thanks for these very useful scripts. I am going to make a habit to
> run these scripts on my side from now on. Do you have any other script that
> I could try against these patches? Please let me know.
>
> Regards,
> Nikhils
>
> On 18 April 2017 at 12:09, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
> wrote:
>
>>
>>
>> On 17 April 2017 at 15:02, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
>> wrote:
>>
>>>
>>>
>>>> >> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>>>> >> Author: Simon Riggs <simon(at)2ndQuadrant(dot)com>
>>>> >> Date: Tue Apr 4 15:56:56 2017 -0400
>>>> >>
>>>> >> Speedup 2PC recovery by skipping two phase state files in normal
>>>> path
>>>> >
>>>> > Thanks Jeff for your tests.
>>>> >
>>>> > So that's now two crash bugs in as many days and lack of clarity about
>>>> > how to fix it.
>>>> >
>>>>
>>>
>>> The issue seems to be that a prepared transaction is yet to be
>> committed. But autovacuum comes in and causes the clog to be truncated
>> beyond this prepared transaction ID in one of the runs.
>>
>> We only add the corresponding pgproc entry for a surviving 2PC
>> transaction on completion of recovery. So could be a race condition here.
>> Digging in further.
>>
>> Regards,
>> Nikhils
>> --
>> Nikhil Sontakke http://www.2ndQuadrant.com/
>> PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
>>
>
>
>
> --
> Nikhil Sontakke http://www.2ndQuadrant.com/
> PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
>
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
2pc_redo_remove_bug_v2.0.patch | application/octet-stream | 786 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2017-04-18 09:12:38 | Re: Passing values to a dynamic background worker |
Previous Message | Heikki Linnakangas | 2017-04-18 08:55:45 | Re: CREATE TRIGGER document typo |