From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: The danger of deleting backup_label |
Date: | 2023-10-12 14:19:15 |
Message-ID: | 65825be1-e79a-46f4-9d9f-4ff95a10e378@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Thomas,
On 10/11/23 18:10, Thomas Munro wrote:
>
> Even though I spent a whole bunch of time trying to figure out how to
> make concurrent reads of the control file sufficiently atomic for
> backups (pg_basebackup and low level filesystem tools), and we
> explored multiple avenues with varying results, and finally came up
> with something that basically works pretty well... actually I just
> hate all of that stuff, and I'm hoping to be able to just withdraw
> https://commitfest.postgresql.org/45/4025/ and chalk it all up to
> discovery/education and call *this* thread the real outcome of that
> preliminary work.
>
> So I'm +1 on the idea of putting a control file image into the backup
> label and I'm happy that you're looking into it.
Well, hopefully this thread will *at least* be the solution going
forward. Not sure about a back patch yet, see below...
> We could just leave the control file out of the base backup
> completely, as you said, removing a whole foot-gun.
That's the plan.
> People following
> the 'low level' instructions will still get a copy of the control file
> from the filesystem, and I don't see any reliable way to poison that
> file without also making it so that a crash wouldn't also be prevented
> from recovering. I have wondered about putting extra "fingerprint"
> information into the control file such as the file's path and inode
> number etc, so that you can try to distinguish between a control file
> written by PostgreSQL, and a control file copied somewhere else, but
> that all feels too fragile, and at the end of the day, people
> following the low level backup instructions had better follow the low
> level backup instructions (hopefully via the intermediary of an
> excellent external backup tool).
Not sure about the inode idea, because it seems OK for people to move a
cluster elsewhere under a variety of circumstances. I do have an idea
about how to mark a cluster in "recovery to consistency" mode, but not
quite sure how to atomically turn that off at the end of recovery to
consistency. I have some ideas I'll work on though.
> As Stephen mentioned[1], we could perhaps also complain if both backup
> label and control file exist, and then hint that the user should
> remove the *control file* (not the backup label!). I had originally
> suggested we would just overwrite the control file, but by explicitly
> complaining about it we would also bring the matter to tool/script
> authors' attention, ie that they shouldn't be backing that file up, or
> should be removing it in a later step if they copy everything. He
> also mentions that there doesn't seem to be anything stopping us from
> back-patching changes to the backup label contents if we go this way.
> I don't have a strong opinion on that and we could leave the question
> for later.
I'm worried about the possibility of back patching this unless the
solution comes out to be simpler than I think and that rarely comes to
pass. Surely throwing errors on something that is currently valid (i.e.
backup_label and pg_control both present).
But perhaps there is a simpler, acceptable solution we could back patch
(transparent to all parties except Postgres) and then a more advanced
solution we could go forward with.
I guess I had better get busy on this.
Regards,
-David
[1]
https://www.postgresql.org/message-id/ZL69NXjCNG%2BWHCqG%40tamriel.snowman.net
From | Date | Subject | |
---|---|---|---|
Next Message | Nikita Malakhov | 2023-10-12 14:24:54 | Pro et contra of preserving pg_proc oids during pg_upgrade |
Previous Message | David Steele | 2023-10-12 13:58:29 | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |