Re: The danger of deleting backup_label

From: David Steele <david(at)pgmasters(dot)net>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: The danger of deleting backup_label
Date: 2023-10-12 14:19:15
Message-ID: 65825be1-e79a-46f4-9d9f-4ff95a10e378@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas,

On 10/11/23 18:10, Thomas Munro wrote:
>
> Even though I spent a whole bunch of time trying to figure out how to
> make concurrent reads of the control file sufficiently atomic for
> backups (pg_basebackup and low level filesystem tools), and we
> explored multiple avenues with varying results, and finally came up
> with something that basically works pretty well... actually I just
> hate all of that stuff, and I'm hoping to be able to just withdraw
> https://commitfest.postgresql.org/45/4025/ and chalk it all up to
> discovery/education and call *this* thread the real outcome of that
> preliminary work.
>
> So I'm +1 on the idea of putting a control file image into the backup
> label and I'm happy that you're looking into it.

Well, hopefully this thread will *at least* be the solution going
forward. Not sure about a back patch yet, see below...

> We could just leave the control file out of the base backup
> completely, as you said, removing a whole foot-gun.

That's the plan.

> People following
> the 'low level' instructions will still get a copy of the control file
> from the filesystem, and I don't see any reliable way to poison that
> file without also making it so that a crash wouldn't also be prevented
> from recovering. I have wondered about putting extra "fingerprint"
> information into the control file such as the file's path and inode
> number etc, so that you can try to distinguish between a control file
> written by PostgreSQL, and a control file copied somewhere else, but
> that all feels too fragile, and at the end of the day, people
> following the low level backup instructions had better follow the low
> level backup instructions (hopefully via the intermediary of an
> excellent external backup tool).

Not sure about the inode idea, because it seems OK for people to move a
cluster elsewhere under a variety of circumstances. I do have an idea
about how to mark a cluster in "recovery to consistency" mode, but not
quite sure how to atomically turn that off at the end of recovery to
consistency. I have some ideas I'll work on though.

> As Stephen mentioned[1], we could perhaps also complain if both backup
> label and control file exist, and then hint that the user should
> remove the *control file* (not the backup label!). I had originally
> suggested we would just overwrite the control file, but by explicitly
> complaining about it we would also bring the matter to tool/script
> authors' attention, ie that they shouldn't be backing that file up, or
> should be removing it in a later step if they copy everything. He
> also mentions that there doesn't seem to be anything stopping us from
> back-patching changes to the backup label contents if we go this way.
> I don't have a strong opinion on that and we could leave the question
> for later.

I'm worried about the possibility of back patching this unless the
solution comes out to be simpler than I think and that rarely comes to
pass. Surely throwing errors on something that is currently valid (i.e.
backup_label and pg_control both present).

But perhaps there is a simpler, acceptable solution we could back patch
(transparent to all parties except Postgres) and then a more advanced
solution we could go forward with.

I guess I had better get busy on this.

Regards,
-David

[1]
https://www.postgresql.org/message-id/ZL69NXjCNG%2BWHCqG%40tamriel.snowman.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2023-10-12 14:24:54 Pro et contra of preserving pg_proc oids during pg_upgrade
Previous Message David Steele 2023-10-12 13:58:29 Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"