Re: Test to dump and restore objects left behind by regression

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Test to dump and restore objects left behind by regression
Date: 2024-11-14 10:46:28
Message-ID: CAExHW5uO4kcwiGYogpFYg1ytWrnXuB8AL2acuOFB4tRXNFejEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 7, 2024 at 3:59 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> Hi Tom and Michael,
>
> Thanks for your inputs.
>
> I am replying to all the comments in a single email arranging related
> comments together.
>
> On Thu, Oct 31, 2024 at 11:26 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On my laptop, testing the plain format adds roughly 12s, in a test
> > that now takes 1m20s to run vs 1m32s. Enabling regress_dump_formats
> > and adding three more formats counts for 45s of runtime. For a test
> > that usually shows up as the last one to finish for a heavily
> > parallelized run. So even the default of "plain" is going to be
> > noticeable, I am afraid.
>
> > On Thu, Oct 31, 2024 at 10:26:01AM -0400, Tom Lane wrote:
> > > I'd be okay with adding it in a form where the default behavior
> > > is to do no additional checking.
>
> If I run the test alone, it takes 45s (master) vs 54s (with patch) on
> my laptop. These readings are similar to what you have observed. The
> restore step by itself takes most of the time, even if a. we eliminate
> data, b. use formats other than plain or c. use --jobs=2. Hence I am
> fine with Tom's suggestion i.e. default behaviour is to do no
> additional testing. I propose to test all dump formats (including
> plain) only when PG_TEST_EXTRA has "regress_dump_tests".

Done.

> But see next
>
> >
> > What's the advantage of testing all the formats? Would that stuff
> > have been able to catch up more issues related to specific format(s)
> > when it came to the compression improvements with inheritance?
>
> I haven't caught any more issues with formats other than "plain". It
> is more for future-proof testing. I am fine if we want to test just
> plain dump format for now. Adding more formats would be easier if
> required.

Not done for now. Given that the 'directory' formats dumps the tables
in separate directories, and thus has some impact on how child tables
would be dumped and restored, I think we should at least have plain
and directory tested in this test. But I will wait for other opinion
before removing formats other than plain.

>
> >> Whether that's worth maintaining
> > > is hard to say though.
> >
> > In terms of maintenance, it would be nice if we are able to minimize
> > the code added to the pg_upgrade suite, so as it would be simple to
> > switch this code elsewhere if need be.
>
> I think Tom hints at maintenance of
> AdjustDump::adjust_dump_for_restore(). In future, if the difference
> between dump from the original database and that from the restored
> database grows, we will need to update
> AdjustDump::adjust_dump_for_restore() accordingly. That will be some
> maintenance. But the person introducing such changes will get a chance
> to fix them if unintentional. That balances out any maintenance
> efforts, I think.

I added a test in AdjustDump::adjust_dump_for_restore() to make sure
that the column order adjustment is indeed applied. Thus now the test
will fail when
a. the adjustment is not needed anymore, in which we could remove
adjustment logic
b. more adjustments are required

Interestingly, I have caught a new difference in dump from original
and restored database. See the difference between attached plain dump
files. I will start a new thread to see if this difference is
legitimate. Had this test been part of core, we would have caught it
earlier.

Because of this difference, the test is failing. I will wait for the
conclusion on the other thread before adding more adjustments.

>
> We could turn the two invocations of pg_dump for comparison (in the
> patch) into a routine if that helps. It might shave a few lines of
> code. Since the routine won't be general, it should reside in
> 002_pg_upgrade where it is used.

Done. Added a function to take dump output from given server and
adjust it. The function is used for both original and restored
database. Shaves a handful lines and deduplicates the logic to take
dump and adjust. I like the end result.

>
> I agree that "original" is a generic name. And I like your suggestion
> partly. I will rename it as "adjust_column_order".
>

Done.

Also added AdjustDump.pm to the list of modules to be installed in
meson.build and Makefile.

All these changes are part of 0001 patch now.

> >
> > I'm wondering if it would make sense to also externalize the dump
> > comparison routine currently in the pg_upgrade script.
> > - Comparison of two dumps, with potentially filters applied to them,
> > with diff printed. In a new module.
>
> It is a good idea to externalize the compare_dump() function in
> PostgreSQL::Test::Utils. Similar code exists in
> 002_compare_backups.pl. 027_stream_regress.pl also uses compare() to
> compare dump files but it uses `diff` command for the same. We can
> change both usages to use compare_dump().
>

I have made that change in 0002 patch. The resultant code looks
better, it standardizes the way we compare dumps and report
differences if any. As a bonus, the dump files being compared are
"noted" in regress_log_* so that it's easy to locate them for
debugging and investigation. New
PostgreSQL::Test::Utils::compare_dumps() routine compares the contents
of given two dump files. If the files do not match it will print the
difference along with the paths of files. If the files match, it will
"note" the paths. Three tests use this routine now
002_compare_backups, 002_pg_upgrade and 027_stream_regress. With this
change 002_pg_upgrade will start "note"ing path of dump files which it
didn't do before. The resultant output in regress_log_... file is more
useful, I think

```
ok 16 - dump outputs of original and restored regression database,
using format 'tar', match
# first dump file:
/masked-path/build/dev/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_RIwT/src_dump.sql_adjusted
# second dump file:
/masked-path/build/dev/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_RIwT/dest_dump.tar.sql_adjusted
```
027_stream_regress used command_ok + diff for the same purpose. But I
don't see a reason, in relevant thread [1], why it can't use the new
routine instead.

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK-%2Bmg6RWiDu0JudF6jWeL5%2BgPmi8EKUm1eAzmdbwiE_A%40mail.gmail.com#8a9dab584fb2d28d10645ac58e7e55d3

I am not against the other suggestions to make the functions, code
added by this patch more general and extensible. But without an
example or case for such generalization and/or extensibility, it's
hard to get it right. And the functions and code are isolated enough
that we could generalize and extend them if the need arises.

--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
0002-Add-PostreSQL-Test-Utils-compare_dumps-20241114.patch text/x-patch 4.8 KB
0001-Test-pg_dump-restore-of-regression-objects-20241114.patch text/x-patch 13.5 KB
dest_dump.plain.sql_adjusted application/octet-stream 491.2 KB
src_dump.sql_adjusted application/octet-stream 491.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2024-11-14 10:48:02 Potential ABI breakage in upcoming minor releases
Previous Message Shlok Kyal 2024-11-14 10:20:48 Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY