Re: Add -k/--link option to pg_combinebackup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Israel Barth Rubio <barthisrael(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add -k/--link option to pg_combinebackup
Date: 2025-02-12 19:54:50
Message-ID: CA+TgmobZDQp=AJH6YQNJ+bTm9MuEQ2We-QBT=OGO9bx0AErAQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Some more review:

+ Use hard links instead of copying files to the synthetic backup.
+ Reconstruction of the synthetic backup might be faster (no file copying
)
+ and use less disk space.

I think it would be good to add a caveat at the end of this sentence:
and use less disk space, but care must be taken when using the output
directory, because any modifications to that directory (for example,
starting the server) can also affect the input directories. Likewise,
changes to the input directories (for example, starting the server on
the full backup) could affect the output directory. Thus, this option
is best used when the input directories are only copies that will be
removed after <application>pg_combienbackup</application> has
completed.

- which can result in near-instantaneous copying of the data files.
+ which can result in near-instantaneous copying of the data
files, giving
+ the speed advantages of <option>-k</option>/<option>--link</option>
+ while leaving the input backups untouched.

I don't think we need this hunk.

+ When <application>pg_combinebackup</application> is run with
+ <option>-k</option>/<option>--link</option>, the output synthetic backup may
+ share several files with the input backups because of the hard links it
+ creates. Any changes performed on files that were linked between any of the
+ input backups and the synthetic backup, will be reflected on both places.

I don't like the wording of this for a couple of reasons. First,
"several" means more than 1 and less than all, but we really have no
idea: it could be none, one, some, many, or all. Second, we want to be
a little careful not to define what a hard link means here. Also,
these notes are a bit far from the documentation of --link, so
somebody might miss them. I suggest that we can probably just leave
out the notes you've added here if we add something like what I
suggested above to the documentation of --link itself.

+ pg_log_warning_hint("If the input directories are not staging "
+ "directories,
it is recommended to move the output"
+ "directory to
another file system or machine "
+ "before
performing changes to the files and/or "
+ "starting the cluster");

I don't think "staging directories" is a sufficiently well-known and
unambiguous term that we should be using it here. Also, "move the
output directory" seems like fuzzy wording. You can really only move
files around within a filesystem; otherwise, you're making a copy and
deleting the original. I think we can just drop this hint entirely.
The primary warning message seems sufficient to me.

I'm still not a fan of the changes to 010_links.pl; let's drop those.

cfbot is fairly unhappy with your patch. See
http://cfbot.cputube.org/israel-barth.html or
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5508 --
I haven't looked into what is going wrong here, and there may well be
more than one thing, but nothing can be committed until this is all
green.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-02-12 19:55:53 Re: explain analyze rows=%.0f
Previous Message Matheus Alcantara 2025-02-12 19:53:46 Re: dblink: Add SCRAM pass-through authentication