From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Subject: | Re: First-draft release notes for back-branch releases |
Date: | 2018-11-07 01:17:37 |
Message-ID: | 20181107011737.GD1677@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 06, 2018 at 11:44:56PM +0000, Andrew Gierth wrote:
> The commit message doesn't really show the severity of the problem at
> all.
I take the blame for that. And my apologies for what it's worth.
> The users whose case I was diagnosing on IRC were finding that their
> monitoring system was sufficient to trigger the problem at least 80% of
> the time. Consider that the broken minRecoveryPoint can be quite a long
> way in the past relative to on-disk data pages, so the window of
> vulnerability isn't necessarily small.
The first report after the last point release on the matter is here, and
those folks had exactly the same symptoms with clients aggressively
connecting to the standby:
https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org
And this came out pretty quickly.
> So while there _probably_ isn't any data corruption, the standby can get
> into a state that isn't restartable unless you know to block client
> connections to it until it has caught up. Rebuilding the standby from
> the master will work but that may be a significant practical problem if
> the data is large.
The problem would show up if you enforce a crash recovery when
restarting the standby, not after when letting it shut down cleanly.
Corruptions could actually happen if you try to promote the standby
before it reaches the actual recovery LSN when it failed to update
minRecoveryPoint after it performed a crash recovery. However this is
proving to be a problem only if have a standby do a crash recovery and a
promotion immediately afterwards, which does not happen when recovering
from a backup as well as the minimum recovery LSN comes from the backup
end record, not from the control file.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2018-11-07 01:45:57 | Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation |
Previous Message | Imai, Yoshikazu | 2018-11-07 01:00:17 | RE: speeding up planning with partitions |