Re: Remove Deprecated Exclusive Backup Mode

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Adrien NAYRAT <adrien(dot)nayrat(at)anayrat(dot)info>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remove Deprecated Exclusive Backup Mode
Date: 2019-02-26 17:20:00
Message-ID: CAHGQGwHLuG0dc3TRkzJdVhJKUkR9=K3UMBBHvdNtvht=P+z7Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 26, 2019 at 3:17 AM David Steele <david(at)pgmasters(dot)net> wrote:
>
> On 2/25/19 7:50 PM, Fujii Masao wrote:
> > On Mon, Feb 25, 2019 at 10:49 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> >>
> >> I'm not playing devil's advocate here to annoy you. I see the problems
> >> with the exclusive backup, and I see how it can hurt people.
> >> I just think that removing exclusive backup without some kind of help
> >> like Andres sketched above will make people unhappy.
> >
> > +1
> >
> > Another idea is to improve an exclusive backup method so that it will never
> > cause such issue. What about changing an exclusive backup mode of
> > pg_start_backup() so that it creates something like backup_label.pending file
> > instead of backup_label? Then if the database cluster has backup_label.pending
> > file but not recovery.signal (this is the case where the database is recovered
> > just after the server crashes while an exclusive backup is in progress),
> > in this idea, the recovery using that database cluster always ignores
> > (or removes) backup_label.pending file and start replaying WAL from
> > the REDO location that pg_control file indicates. So this idea enables us to
> > work around the issue that an exclusive backup could cause.
>
> It's an interesting idea.
>
> > On the other hand, the downside of this idea is that the users need to change
> > the recovery procedure. When they want to do PITR using the backup having
> > backup_label.pending, they need to not only create recovery.signal but also
> > rename backup_label.pending to backup_label. Rename of backup_label file
> > is brand-new step for their recovery procedure, and changing the recovery
> > procedure might be painful for some users. But IMO it's less painful than
> > removing an exclusive backup API at all.
>
> Well, given that we have invalidated all prior recovery procedures in
> PG12 I'm not sure how big a deal that is. Of course, it's too late make
> a change like this for PG12.
>
> > Thought?
>
> Here's the really obvious bad thing: if users do not update their
> procedures and we ignore backup_label.pending on startup then they will
> end up with a corrupt database because it will not replay from the
> correct checkpoint. If we error on the presence of backup_label.pending
> then we are right back to where we started.

No. In this case, since backup_label.pending and recovery.signal exist,
as I described in my previous post, the server stops the recovery with
PANIC error before corrupting the database. Then the operator can
rename backup_label.pending to backup_label and restart the recovery
safely.

So, let me clarify the situations;

(1) If backup_label and recovery.signal exist, the recovery starts safely.
This is the normal case of recovery from the base backup.

(2)If backup_label.pending and recovery.signal exist, as described above,
PANIC error happens at the start of recovery. This case can happen
if the operator forgets to rename backup_label.pending, i.e.,
operation mistake. So, after PANIC, the operator needs to fix her or
his mistake (i.e., rename backup_label.pending) and restart
the recovery.

(3) If backup_label.pending exists but recovery.signal doesn't, the server
ignores (or removes) backup_label.pending and do the recovery
starting the pg_control's REDO location. This case can happen if
the server crashes while an exclusive backup is in progress.
So crash-in-the-middle-of-backup doesn't prevent the recovery from
starting in this idea.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-02-26 17:25:19 Re: ATTACH/DETACH PARTITION CONCURRENTLY
Previous Message Merlin Moncure 2019-02-26 16:51:53 Re: crosstab/repivot...any interest?