Re: Add recovery to pg_control and remove backup_label

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add recovery to pg_control and remove backup_label
Date: 2023-11-20 17:53:55
Message-ID: 38b212b7-b8c0-46ea-940b-f27ec3a24deb@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/20/23 12:11, Robert Haas wrote:
> On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> (I am not exactly sure how, but we've lost pgsql-hackers on the way
>> when you sent v5. Now added back in CC with the two latest patches
>> you've proposed attached.)
>>
>> Here is a short summary of what has been missed by the lists:
>> - I've commented that the patch should not create, not show up in
>> fields returned the SQL functions or stream control files with a size
>> of 512B, just stick to 8kB. If this is worth changing this should be
>> applied consistently across the board including initdb, discussed on
>> its own thread.
>> - The backup-related fields in the control file are reset at the end
>> of recovery. I've suggested to not do that to keep a trace of what
>> was happening during recovery. The latest version of the patch resets
>> the fields.
>> - With the backup_label file gone, we lose some information in the
>> backups themselves, which is not good. Instead, you have suggested an
>> approach where this data is added to the backup manifest, meaning that
>> no information would be lost, particularly useful for self-contained
>> backups. The fields planned to be added to the backup manifest are:
>> -- The start and end time of the backup, the end timestamp being
>> useful to know when stop time can be used for PITR.
>> -- The backup label.
>> I've agreed that it may be the best thing to do at this end to not
>> lose any data related to the removal of the backup_label file.
>
> I think we need more votes to make a change this big. I have a
> concern, which I think I've expressed before, that we keep whacking
> around the backup APIs, and that has a cost which is potentially
> larger than the benefits.

From my perspective it's not that big a change for backup software but
it does bring a lot of benefits, including fixing an outstanding bug in
Postgres, i.e. reading pg_control without getting a torn copy.

> The last time we changed the API, we changed
> pg_stop_backup to pg_backup_stop, but this doesn't do that, and I
> wonder if that's OK. Even if it is, do we really want to change this
> API around again after such a short time?

This is a good point. We could just rename again, but not sure what
names to go for this time. OTOH if the backup software is selecting
fields then they will get an error because the names have changed. If
the software is grabbing fields by position then they'll get a
valid-looking result (even if querying by position is a terrible idea).

Another thing we could do is explicitly error if we see backup_label in
PGDATA during recovery. That's just a few lines of code so would not be
a big deal to maintain. This error would only be visible on restore, so
it presumes that backup software is being tested.

Maybe just a rename to something like pg_backup_begin/end would be the
way to go.

> That said, I don't have an intrinsic problem with moving this
> information from the backup_label to the backup_manifest file since it
> is purely informational. I do think there should perhaps be some
> additions to the test cases, though.

A little hard to add to the tests, I think, since they are purely
informational, i.e. not pushed up by the parser. Maybe we could just
grep for the fields?

> I am concerned about the interaction of this proposal with incremental
> backup. When you take an incremental backup, you get something that
> looks a lot like a usable data directory but isn't. To prevent that
> from causing avoidable disasters, the present version of the patch
> adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the
> backup_label. pg_combinebackup knows to look for those fields, and the
> server knows that if they are present it should refuse to start. With
> this change, though, I suppose those fields would end up in
> pg_control. But that does not feel entirely great, because we have a
> goal of keeping the amount of real data in pg_control below 512 bytes,
> the traditional sector size, and this adds another 12 bytes of stuff
> to that file that currently doesn't need to be there. I feel like
> that's kind of a problem.

I think these fields would be handled the same as the rest of the fields
in backup_label: returned from pg_backup_stop() and also stored in
backup_manifest. Third-party software can do as they like with them and
pg_combinebackup can just read from backup_manifest.

As for the pg_control file -- it might be best to give it a different
name for backups that are not essentially copies of PGDATA. On the other
hand, pgBackRest has included pg_control in incremental backups since
day one and we've never had a user mistakenly do a manual restore of one
and cause a problem (though manual restores are not the norm). Still,
probably can't hurt to be a bit careful.

> But my main point here is ... if we have a few more senior hackers
> weigh in and vote in favor of this change, well then that's one thing.
> But IMHO a discussion that's mostly between 2 people is not nearly a
> strong enough consensus to justify this amount of disruption.

We absolutely need more people to look at this and sign off. I'm glad
they have not so far because it has allowed time to whack the patch
around and get it into better shape.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-20 18:24:42 Re: Use of backup_label not noted in log
Previous Message Tom Lane 2023-11-20 17:45:58 Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500