Re: Failed recovery with new faster 2PC code

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Subject: Re: Failed recovery with new faster 2PC code
Date: 2017-04-18 10:54:30
Message-ID: CANP8+jK_PF6O3CGkomUbk_hj6-P-GHbphL_wtnCXtVv8y1w7TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 April 2017 at 09:57, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> wrote:

> Please find attached a second version of my bug fix which is stylistically
> better and clearer than the first one.

Yeh, this is better. Pushed.

The bug was that the loop set gxact to be the last entry in the array,
causing the exit condition to fail and us then to remove the last
gxact from memory even when it didn't match the xid, removing a valid
entry too early. That then allowed xmin to move forwards, which causes
autovac to remove pg_xact entries earlier than needed.

Well done for finding that one, thanks for the patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-04-18 10:55:01 Logical replication and synchronous replication
Previous Message Michael Paquier 2017-04-18 10:44:22 Re: PANIC in pg_commit_ts slru after crashes