Re: Test to dump and restore objects left behind by regression

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test to dump and restore objects left behind by regression
Date: 2025-02-11 12:23:20
Message-ID: CAExHW5sBbMki6Xs4XxFQQF3C4Wx3wxkLAcySrtuW3vrnOxXDNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

On Sun, Feb 9, 2025 at 1:25 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Feb 07, 2025 at 07:11:25AM +0900, Michael Paquier wrote:
> > Okay, thanks for the feedback. We have been relying on diff -u for
> > the parts of the tests touched by 0001 for some time now, so if there
> > are no objections I would like to apply 0001 in a couple of days.
>
> This part has been applied as 169208092f5c.

Thanks. PFA rebased patches.

I have added another diff adjustment to adjust_regress_dumpfile().
It's introduced by 83ea6c54025bea67bcd4949a6d58d3fc11c3e21b.

On Thu, Feb 6, 2025 at 11:32 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Feb 05, 2025 at 03:28:04PM +0900, Michael Paquier wrote:
> > Hmm. I was reading through the patch and there is something that
> > clearly stands out IMO: the new compare_dumps(). It is in Utils.pm,
> > and it acts as a wrapper of `diff` with its formalized output format.
> > It is not really about dumps, but about file comparisons. This should
> > be renamed compare_files(), with internals adjusted as such, and
> > reused in all the existing tests. Good idea to use that in
> > 027_stream_regress.pl, actually. I'll go extract that first, to
> > reduce the presence of `diff` in the whole set of TAP tests.
>
> The result of this part is pretty neat, resulting in 0001 where it is
> possible to use the refactored routine as well in pg_combinebackup
> where there is a piece comparing dumps. There are three more tests
> with diff commands and assumptions of their own, that I've left out.
> This has the merit of unifying the output generated should any diffs
> show up, while removing a nice chunk from the main patch.
>
> > AdjustDump.pm looks like a fine concept as it stands. I still need to
> > think more about it. It feels like we don't have the most optimal
> > interface, though, but perhaps that will be clearer once
> > compare_dumps() is moved out of the way.

Without knowing what makes the interface suboptimal, it's hard to make
it optimal. I did think about getting rid of adjust_child_columns
flag. But that either means we adjust CREATE TABLE ... INHERIT
statements from both the dump outputs from original and the restored
database to a canonical form or get rid of the tests in that function
to make sure that the adjustment is required. The first seems more
work (coding and run time). The tests look useful to detect when the
adjustment won't be required.

I also looked at the routines which adjust the dumps from upgrade
tests. They seem to be specific to the older versions and lack the
extensibility you mentioned earlier.

The third thing I looked at was the possibility of applying the
adjustments to only the dump from the original database where it is
required by passing the newline adjustments to compare_files().
However 0002 in the attached set of patches adds more logic,
applicable to both the original and restored dump outputs, to
AdjustDump.pm. So we can't do that either.

I am clueless as to what could be improved here.

>
> + my %dump_formats = ('plain' => 'p', 'tar' => 't', 'directory' => 'd', 'custom' => 'c');
>
> No need for this mapping, let's just use the long options.

Hmm, didn't realize -F accepts whole format name as well. pg_dump
--help doesn't give that impression but user facing documentation
mentions it. Done.

>
> + # restore data as well to catch any errors while doing so.
> + command_ok(
> + [
> + 'pg_dump', "-F$format_spec", '--no-sync',
> + '-d', $src_node->connstr('regression'),
> + '-f', $dump_file
> + ],
> + "pg_dump on source instance in $format format");
>
> The use of command_ok() looks incorrect here. Shouldn't we use
> $src_node->command_ok() here to ensure a correct PATH? That would be
> more consistent with the other dump commands. Same remark about
> @restore_command.

Cluster::command_ok's prologue doesn't mention PATH but mentions
PGHOST and PGPORT.
```
Runs a shell command like PostgreSQL::Test::Utils::command_ok, but
with PGHOST and PGPORT set
so that the command will default to connecting to this PostgreSQL::Test::Cluster
```
According to sub _get_env(), PATH is set only when custom install path
is provided. In the absence of that, build path is used. In this case,
the source and destination nodes are created from the build itself, so
no separate path is required. PGHOST and PGPORT are anyway overridden
by the connection string fetched from the node. So I don't think
there's any correctness issue here, but it's better to use
Cluster::command_ok() just for better readability. Done

>
> + # The order of columns in COPY statements dumped from the original database
> + # and that from the restored database differs. These differences are hard to
>
> What are the relations we are talking about here?

These are the child tables whose parent has added a column after being
inherited. Now that I have more expertise with perl regex, I have
added code in AdjustDump.pm to remove only the COPY statements where
we see legitimate difference. Added a comment explaining the cause
behind the difference. This patch is supposed to be merged into 0001
before committing the change.

--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
0001-Test-pg_dump-restore-of-regression-objects-20250211.patch text/x-patch 14.3 KB
0002-Filter-COPY-statements-with-differing-colum-20250211.patch text/x-patch 5.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilia Evdokimov 2025-02-11 12:30:09 Re: Move wal_buffers_full to WalUsage (and report it in pgss/explain)
Previous Message Álvaro Herrera 2025-02-11 12:14:52 Re: pg_stat_statements and "IN" conditions