Re: post-freeze damage control

From: Stefan Fercot <stefan(dot)fercot(at)protonmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: post-freeze damage control
Date: 2024-04-09 13:34:12
Message-ID: sMEDEzhBYSazAxBNRq7vVauI0lOz7Bd4z82zG93vMHGGy2GO-DaQ3kuP2hb-DrpAWIHcigTOeA3_3RFiUSzKmsnH8ZPyhuf7IEjXpxsNO2s=@protonmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tuesday, April 9th, 2024 at 2:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> In all sincerity, I appreciate the endorsement. Basically what's been
> scaring me about this feature is the possibility that there's some
> incurable design flaw that I've managed to completely miss. If it has
> some more garden-variety bugs, that's still pretty bad: people will
> potentially lose data and be unable to get it back. But, as long as
> we're able to find the bugs and fix them, the situation should improve
> over time until, hopefully, everybody trusts it roughly as much as we
> trust, say, crash recovery. Perhaps even a bit more: I think this code
> is much better-written than our crash recovery code, which has grown
> into a giant snarl that nobody seems able to untangle, despite
> multiple refactoring attempts. However, if there's some reason why the
> approach is fundamentally unsound which I and others have failed to
> detect, then we're at risk of shipping a feature that is irretrievably
> broken. That would really suck.

IMHO it totally worth shipping such long-waited feature sooner than later.
Yes, it is a complex one, but you started advertising it since last January already, so people should already be able to play with it in Beta.

And as you mentioned in your blog about the evergreen backup:

> But if you're anything like me, you'll already see that this arrangement
> has two serious weaknesses. First, if there are any data-corrupting bugs
> in pg_combinebackup or any of the server-side code that supports
> incremental backup, this approach could get you into big trouble.

At some point, the only way to really validate a backup is to actually try to restore it.
And if people get encouraged to do that faster thanks to incremental backups, they could detect potential issues sooner.
Ultimately, users will still need their full backups and WAL archives.
If pg_combinebackup fails for any reason, the fix will be to perform the recovery from the full backup directly.
They still should be able to recover, just slower.

--
Stefan FERCOT
Data Egret (https://dataegret.com)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-04-09 13:45:11 Re: WIP Incremental JSON Parser
Previous Message Andrei Lepikhov 2024-04-09 13:20:11 Re: post-freeze damage control