Re: pg_combinebackup fails on file named INCREMENTAL.*

From: Stefan Fercot <stefan(dot)fercot(at)protonmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: pg_combinebackup fails on file named INCREMENTAL.*
Date: 2024-04-16 16:06:06
Message-ID: agep1fDgX2M536xHp5Z0J0J90jaDGK2A_juQNbfRtW95lic18N3smUL201DWTlnFKMQ-XMYvK4dVSObv_qt7GBbEQ2y_GC4r2Dj4Ifr5ZY0=@protonmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, April 16th, 2024 at 3:22 PM, Robert Haas wrote:
> What I fear is that this will turn into another situation like we had
> with pg_xlog, where people saw "log" in the name and just blew it
> away. Matter of fact, I recently encountered one of my few recent
> examples of someone doing that thing since the pg_wal renaming
> happened. Some users don't take much convincing to remove anything
> that looks inessential. And what I'm particularly worried about with
> this feature is tar-format backups. If you have a directory format
> backup and you do an "ls", you're going to see a whole bunch of files
> in there of which backup_manifest will be one. How you treat that file
> is just going to depend on what you know about its purpose. But if you
> have a tar-format backup, possibly compressed, the backup_manifest
> file stands out a lot more. You may have something like this:
>
> backup_manifest root.tar.gz 16384.tar.gz

Sure, I can see your point here and how people could be tempted to through away that backup_manifest if they don't know how important it is to keep it.
Probably in this case we'd need the list to be inside the tar, just like backup_label and tablespace_map then.

> The kicker for me is that I can't see any reason to do any of this
> stuff. Including the information that we need to elide incremental
> stubs in some other way, say with one stub-list per directory, will be
> easier to implement and probably perform better. Like, I'm not saying
> we can't find a way to jam this into the manifest. But I'm fairly sure
> it's just making life difficult for ourselves.
>
> I may ultimately lose this argument, as I did the one about whether
> the backup_manifest should be JSON or some bespoke format. And that's
> fine. I respect your opinion, and David's. But I also reserve the
> right to feel differently, and I do.

Do you mean 1 stub-list per pgdata + 1 per tablespaces?

Sure, it is important to respect and value each other feelings, I never said otherwise.

I don't really see how it would be faster to recursively go through each sub-directories of the pgdata and tablespaces to gather all the pieces together compared to reading 1 main file.
But I guess, choosing one option or the other, we will only find out how well it works once people will use it on the field and possibly give some feedback.

As you mentioned in [1], we're not going to start rewriting the implementation a week after feature freeze nor probably already start building big new things now anyway.
So maybe let's start with documenting the possible gotchas/corner cases to make our support life easier in the future.

Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

[1] https://www.postgresql.org/message-id/CA%2BTgmoaVxr_o3mrDBrBcXm3gowr9Qc4ABW-c73NR_201KkDavw%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-04-16 16:09:10 soliciting patches to review
Previous Message Andres Freund 2024-04-16 15:59:56 Re: Table AM Interface Enhancements