From: | Israel Barth Rubio <barthisrael(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add -k/--link option to pg_combinebackup |
Date: | 2025-02-08 02:12:59 |
Message-ID: | CAO_rXXB__YFC1R+4C3evc5_55UoqKhZN6C43EqzvrXwDy1A7gQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Robert,
> In general, +1, although I think that the patch needs polishing in a
> bunch of areas.
Thanks for your review!
> Originally, I thought what we wanted was something like a --in-place
> option to pg_combinebackup, allowing the output directory to be the
> same as one of the input directories. However, I now think that what
> this patch does is likely better, and this patch is a lot simpler to
> write than that one would be. With the --in-place option, if I'm
> combine backup1 and backup2, I must write either "pg_combinebackup
> backup1 backup2 -o backup1" or "pg_combinebackup backup1 backup2 -o
> backup2". In either case, I can skip whole-file copies from one of the
> two source directories -- whichever one happens to be the output
> directory. However, if I write "pg_combinebackup --link backup1
> backup2 -o result", I can skip *all* whole-file copies from *every*
> input directory, which seems a lot nicer.
>
> Furthermore, if all the input directories are staging directories,
> basically copies of original backups stored elsewhere, then the fact
> that those directories get trashed is of no concern to me. In fact,
> they don't even get trashed if pg_combinebackup is interrupted partway
> through, because I can just remove the output directory and recreate
> it and everything is fine. With an --in-place option, that would be
> trickier.
I see! I thought of not reinventing the wheel while writing, so I replayed
in pg_combinebackup the same behavior implemented in pg_upgrade.
And turns out the outcomes sounded convincing.
> Regarding the patch itself, I think we need to rethink the test case
> changes in particular. As written, the patch won't do any testing of
> link mode unless the user sets PG_TEST_PG_COMBINEBACKUP_MODE=--link;
> and if they do set that option, then we won't test anything else.
> Also, even with that environment variable set, we'll only test --link
> mode a bit ... accidentally. The patch doesn't really do anything to
> make sure that link mode actually does what it's intended to do. It
> only adapts the existing tests not to fail. I think it would be better
> to decide that you're not supposed to set
> PG_TEST_PG_COMBINEBACKUP_MODE=--link, and then write some new test,
> not depending on PG_TEST_PG_COMBINEBACKUP_MODE, that specifically
> verifies that link mode behaves as expected.
>
> After all, link mode is noticeably different from the other copy
> modes. Those should all produce equivalent results, or fail outright,
> we suppose. This produces a different user-visible result, so we
> probably ought to test that we get that result. For example, we might
> check that certain files end up with a link count of 1 or 2, as
> appropriate.
That makes sense to me too. I'm submitting a new patch which includes
a new file 010_links.pl. That test takes care of generating full and
incremental,
backups, then combines them with --link mode and ensures that the hard link
count for all files under PGDATA is as expected based on the operations that
were applied in the cluster.
I kept the "support" for PG_TEST_PG_COMBINEBACKUP_MODE=--link
around, just in case someone wants to run that and verify those outcomes
with
the --link mode. At least now we might have a proper test for checking
--link,
which always runs independently of the env variable.
> Does link() work on Windows?
link is a function available only in Linux. The equivalent function in
Windows is
CreateHardLink.
However, in Postgres link works in either of the operating systems because
we
implement a link function, which wraps the call to CreateHardLink. That's
coded
in the file src/port/win32link.c
> I'm not sure what to do about the issue that using --link trashes the
> input cluster if you subsequently start the database or otherwise
> modify any hard-linked files. Keep in mind that, for command-line
> arguments other than the first, these are incremental backups, and you
> already can't run postgres on those directories. Only the first input
> directory, which is a full backup not an incremental, is a potential
> target for an unexpected database start. I'm tentatively inclined to
> think we shouldn't modify the input directories and just emit a
> warning like:
>
> warning: --link mode was used; any modifications to the output
> directory may destructively modify input directories
The previous patch already had a warning. But I enjoyed your message,
so I slightly changed the warning and the docs in the new version of the
patch.
Best regards,
Israel.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-pg_combinebackup-add-support-for-hard-links.patch | application/octet-stream | 16.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Japin Li | 2025-02-08 02:16:40 | Re: Modern SHA2- based password hashes for pgcrypto |
Previous Message | Jacob Champion | 2025-02-08 01:56:18 | Re: [PoC] Federated Authn/z with OAUTHBEARER |