Re: Add -k/--link option to pg_combinebackup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Israel Barth Rubio <barthisrael(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add -k/--link option to pg_combinebackup
Date: 2025-03-03 17:51:22
Message-ID: CA+Tgmoaz19ozA5hsJFC+P6aWAnsLP0KR6tmdS6wk54-LDWh92A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 3, 2025 at 9:00 AM Israel Barth Rubio <barthisrael(at)gmail(dot)com> wrote:
> > 2) Since it is a new file, "Copyright (c) 2021-2025" should be
> > "Copyright (c) 2025"
>
> Done!

For the record, this proposed change is not a project policy, AIUI. I
don't care very much what we do here, but -1 for kibitzing the range
of years people put in the copyright header.

> Attaching the new version of the patch in this reply.
>
> I had to slightly change 010_links.pl and copy_file.c to
> properly handle Windows, as the meson tests on
> Windows had complaints with the code of v3:
>
> * copy_file.c was forcing CopyFile on Windows regardless
> of the option chosen by the user. Now, to make that behavior
> backward compatible, I'm only forcing CopyFile on Windows
> when the copy method is not --link. That allows my patch to
> work, and doesn't change the previous behavior.
> * Had to normalize paths when performing string matching in
> the test that verifies the hard link count on files.

When I looked at the code for this test, the questions that sprang to
mind for me were:

1. Is this really going to be stable?
2. Is this going to be too expensive?

With regard to the second question, why does this test need to create
three tables with a million rows each instead of some small number of
rows? If you need to fill multiple blocks, consider making the rows
wider instead of inserting such a large number.

With regard to the first question, I'm going to say that the answer is
"no" because when I run this test locally, I get this:

<lots of output omitted>
[12:26:07.979](0.000s) ok 973 - File
'/Users/robert.haas/pgsql-meson/testrun/pg_combinebackup/010_links/data/t_010_links_restore_data/pgdata/base/5/2664'
has 2 hard links
[12:26:07.980](0.000s) ok 974 - File
'/Users/robert.haas/pgsql-meson/testrun/pg_combinebackup/010_links/data/t_010_links_restore_data/pgdata/base/5/1249_vm'
has 2 hard links
[12:26:07.980](0.000s) ok 975 - File
'/Users/robert.haas/pgsql-meson/testrun/pg_combinebackup/010_links/data/t_010_links_restore_data/pgdata/base/5/2836'
has 2 hard links
[12:26:07.980](0.000s) ok 976 - File
'/Users/robert.haas/pgsql-meson/testrun/pg_combinebackup/010_links/data/t_010_links_restore_data/pgdata/base/5/2663'
has 2 hard links
[12:26:07.980](0.000s) ok 977 - File
'/Users/robert.haas/pgsql-meson/testrun/pg_combinebackup/010_links/data/t_010_links_restore_data/pgdata/base/5/6237'
has 2 hard links
Use of uninitialized value $file in stat at
/Users/robert.haas/pgsql/src/bin/pg_combinebackup/t/010_links.pl line
226.

I'm not sure whether that "Use of uninitialized value" message at the
end is killing the script at that point or whether it's just a
warning, but it should be cleaned up either way. For the rest, I'm not
a fan of testing every single file in the data directory like this. It
seems sufficient to me to test two files, one that you expect to
definitely be modified and one that you expect to definitely be not
modified. That assumes that you can guarantee that the file definitely
wasn't modified, which I'm not quite sure about. The test doesn't seem
to disable autovacuum, so what would keep autovacuum from sometimes
processing that table after the full backup and before the incremental
backup, and sometimes not? But there's something else wrong here, too,
because even when I adjusted the test to disable autovacuum, it still
failed in the same way as shown above.

Also, project style is uncuddled braces and lines less than 80 where
possible. So don't do this:

for my $test_3_segment (@test_3_segments) {

The curly brace belongs on the next line. Similarly this should be
three separate lines:

} else {

Regarding long lines, some of these cases are easy to fix:

my $query = "SELECT pg_relation_filepath(oid) FROM pg_class WHERE
relname = '%s';";

This could be fixed by writing my $query = <<EOM;

my $pg_attribute_path = $primary->safe_psql('postgres',
sprintf($query, 'pg_attribute'));

This could be fixed by moving the sprintf() down a line and indenting
it under 'postgres'.

Attached is a small delta patch to fix a few other issues. It does the
following:

* adds a serial comma to pg_combinebackup.sgml. This isn't absolutely
mandatory but we usually prefer this style.

* puts the new -k option in proper alphabetical order in several places

* changes the test in copy_file() to test for == COPY_METHOD_COPY
instead of != COPY_METHOD_COPYFILE and updates the comment to reflect
what I believe the actual reasoning to be

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
rmh-tweaks.txt text/plain 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Guillaume Lelarge 2025-03-03 18:06:38 Re: making EXPLAIN extensible
Previous Message Heikki Linnakangas 2025-03-03 17:44:45 Handle interrupts while waiting on Append's async subplans