Re: Failed recovery with new faster 2PC code

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Subject: Re: Failed recovery with new faster 2PC code
Date: 2017-04-19 02:09:00
Message-ID: CAMkU=1y98=hMk=giv8LDszkZqGgTkk2yYWeHPiz+4SN6m7RL5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 18, 2017 at 1:17 AM, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
wrote:

> Hi,
>
> There was a bug in the redo 2PC remove code path. Because of which,
> autovac would think that the 2PC is gone and cause removal of the
> corresponding clog entry earlier than needed.
>
> Please find attached, the bug fix: 2pc_redo_remove_bug.patch.
>
> I have been testing this on top of Michael's 2pc-restore-fix.patch and
> things seem to be ok for the past one+ hour. Will keep it running for long.
>
> Jeff, thanks for these very useful scripts. I am going to make a habit to
> run these scripts on my side from now on. Do you have any other script that
> I could try against these patches? Please let me know.
>

This script is the only one I have that specifically targets 2PC. I wrote
it last year when the previous round of speed-up code (which avoided
writing the files upon "PREPARE" by delaying them until the next
checkpoint) was developed. I just decided to dust that test off to try
again here. I don't know how to change it to make it more targeted towards
this set of patches. Would this bug have been seen in a replica server in
the absence of crashes, or was it only vulnerable during crash recovery
rather than streaming replication?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-04-19 02:13:36 Re: tablesync patch broke the assumption that logical rep depends on?
Previous Message Jeff Janes 2017-04-19 01:48:45 Re: Failed recovery with new faster 2PC code