Re: Test to dump and restore objects left behind by regression

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Test to dump and restore objects left behind by regression
Date: 2024-10-31 05:56:37
Message-ID: ZyMcFYiuMHXSsAM3@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 09, 2024 at 03:43:58PM +0530, Ashutosh Bapat wrote:
> 894be11adfa60ad1ce5f74534cf5f04e66d51c30 changed the schema in which
> objects in test genereated_stored.sql are created. Because of this the
> new test added by the patch was failing. Fixed the failure in the
> attached.

On my laptop, testing the plain format adds roughly 12s, in a test
that now takes 1m20s to run vs 1m32s. Enabling regress_dump_formats
and adding three more formats counts for 45s of runtime. For a test
that usually shows up as the last one to finish for a heavily
parallelized run. So even the default of "plain" is going to be
noticeable, I am afraid.

+ test_regression_dump_restore($oldnode, %node_params);

Why is this only done for the main regression test suite? Perhaps it
could be useful as well for tests that want to check after their own
custom dumps, as a shortcut?

Linked to that. Could there be some use in being able to pass down a
list of databases to this routine, rather than being limited only to
"regression"? Think extension databases with USE_MODULE_DB that have
unique names.

+ # Dump the original database in "plain" format for comparison later. The
+ # order of columns in COPY statements dumped from the original database and
[...]
+ # Adjust the CREATE TABLE ... INHERITS statements.
+ if ($original)
+ {
+ $dump =~ s/(^CREATE\sTABLE\sgenerated_stored_tests\.gtestxx_4\s\()
+ (\n\s+b\sinteger),
+ (\n\s+a\sinteger)/$1$3,$2/mgx;

The reason why $original exists is documented partially in both
002_pg_upgrade.pl and AdjustDump.pm. It would be better to
consolidate that only in AdjustDump.pm, I guess. Isn't the name
"$original" a bit too general when it comes to applying filters to
the dumps to as the order of the column re-dumped is under control?
Perhaps it would be adapted to use a hash that can be extended with
more than one parameter to control which filters are applied? For
example, imagine a %params where the caller of adjust_dumpfile() can
pass in a "filter_column_order => 1". The filters applied to the dump
are then self-documented. We could do with a second for the
whitespaces, as well.

What's the advantage of testing all the formats? Would that stuff
have been able to catch up more issues related to specific format(s)
when it came to the compression improvements with inheritance?

I'm wondering if it would make sense to also externalize the dump
comparison routine currently in the pg_upgrade script. Perhaps we
should be more ambitious and move more logic into AdjustDump.pm? If
we think that the full cycle of dump -> restore -> dump -> compare
could be used elsewhere, this would limit the footprint of what we are
doing in the pg_upgrade script in this patch and be able to do similar
stuff in out-of-core extensions or other tests. Let's say a
PostgreSQL::Test::Dump.pm?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-10-31 06:17:23 Re: define pg_structiszero(addr, s, r)
Previous Message Bertrand Drouvot 2024-10-31 05:52:49 Re: define pg_structiszero(addr, s, r)