Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Subject: Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Date: 2021-12-01 16:00:20
Message-ID: cc6d981d-d2f9-2af1-08fd-f4646bc9089e@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/30/21 19:54, Michael Paquier wrote:
> On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
>> I did figure out how to keep the safe part of exclusive backup (not having
>> to maintain a connection) while removing the dangerous part (writing
>> backup_label into PGDATA), but it was a substantial amount of work and I
>> felt that it had little chance of being committed.
>
> Which was, I guess, done by storing the backup_label contents within a
> file different than backup_label, still maintained in the main data
> folder to ensure that it gets included in the backup?

That, or emit it from pg_start_backup() so the user can write it
wherever they please. That would include writing it into PGDATA if they
really wanted to, but that would be on them and the default behavior
would be safe. The problem with this is if the user does not
rename/supply backup_label on restore then they will get corruption and
not know it.

Here's another idea. Since the contents of pg_wal are not supposed to be
copied, we could add a file there to indicate that the cluster should
remove backup_label on restart. Our instructions also say to remove the
contents of pg_wal on restore if they were originally copied, so
hopefully one of the two would happen. But, again, if they fail to
follow the directions it would lead to corruption.

Order would be important here. When starting the backup the proper order
would be to write pg_wal/backup_in_progress and then backup_label. When
stopping the backup they would be removed in the reverse order.

On a restart if both are present then delete both in the correct order
and start crash recovery using the info in pg_control. If only
backup_label is present then go into recovery using the info from
backup_label.

It's possible for pg_wal/backup_in_process to be present by itself if
the server crashes after deleting backup_label but before deleting
pg_wal/backup_in_progress. In that case the server should simply remove
it on start and go into crash recovery using the info from pg_control.

The advantage of this idea is that it does not change the current
instructions as far as I can see. If the user is already following them,
they'll be fine. If they are not, then they'll need to start doing so.

Of course, none of this affects users who are using non-exclusive
backup, which I do hope covers the majority by now.

Thoughts?

Regards,
-David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-12-01 16:14:54 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Previous Message Alvaro Herrera 2021-12-01 15:57:41 Re: increase size of pg_commit_ts buffers