Re: Test to dump and restore objects left behind by regression

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test to dump and restore objects left behind by regression
Date: 2025-04-04 16:01:16
Message-ID: 4nuo6wzmidxv7q5qayywazp6iqk7b66psyznjvij3s3jgqoq5p@kxk76xvgwsna
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-04-03 19:14:02 +0200, Alvaro Herrera wrote:
> On 2025-Apr-03, Andres Freund wrote:
>
> > I've increased the timeout even further, but I can't say that I am happy about
> > the slowest test getting even slower. Adding test time in the serially slowest
> > test is way worse than adding the same time in a concurrent test.
>
> Yeah. We discussed strategies to shorten the runtime, but the agreement
> upthread was that we'd look for more elaborate ways to do that
> afterwards.

I think it's not good to just say "we'll maybe somehow fix it in the future",
particularly if the solution is by no means agreed to. If this were a test
that wasn't already the bottleneck for test cycles, it'd would be a different
story, but it is.

I'm already unhappy that 002_pg_upgrade got noticeably slower due to the stats
dump changes. This made it even worse.

> As I mentioned, I can see adding something like PG_TEST_EXCLUDE that we
> could use to suppress this test on slow hosts. Would that work for you?

Not particularly well. For one I don't actually think it's good to exclude it
from something like valgrind testing. But it's also something that wouldn't
really be usable locally, I think, given that we'd be expected to keep the
test working. As outlined below, it really affects the test-hack-test loop
times.

> (We also discussed the fact that this was part of 002_pg_upgrade.pl
> instead of being elsewhere. The reason is that this depends on the
> regression tests having run, and this is the only TAP test that does
> that. Well, this one and 027_stream_regress.pl which is even slower.)

FWIW, for me 027 is actually considerably faster. In an cassert -O0 build (my
normal development env, I find even -Og too problematic for debugging):

pg_upgrade/002_pg_upgrade 96.61s
recovery/027_stream_regress 66.04s

After
git revert 8806e4e8deb1e21715e031e17181d904825a410e abe56227b2e213755dd3e194c530f5f06467bd7c 172259afb563d35001410dc6daad78b250924038

pg_upgrade/002_pg_upgrade 75.09s

Slower by 29%, far from the 3s increased time I saw mentioned somewhere.

And this really affects the overall test time:

All tests before:
real 1m38.173s
user 5m52.500s
sys 4m23.574s

All tests after:
real 2m0.397s
user 5m53.625s
sys 4m30.518s

The CPU time increase is rather minimal, but the wall clock time increase is
22%.

17:
real 1m14.822s
user 4m2.630s
sys 3m22.384s

We regressed wall clock time *60%* from 17->18. Some test cycle increase is
reasonable and can largely be compensated with hardware, but this cycle we're
growing way faster than hardware gets faster. I don't think that's
sustainable.

On valgrind it's also is very much true that 002_pg_upgrade is slower:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2025-04-04%2006%3A53%3A47&stg=recovery-check

276/333 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK 10337.79s 22 subtests passed
233/333 postgresql:recovery / recovery/027_stream_regress OK 7642.90s 9 subtests passed

To look at the times in a bit more detail:

[09:19:31.371](6146.167s) ok 5 - regression tests pass
...
# Running: pg_dump -Fd -j2 --no-sync -d port=31947 host=/tmp/HlLiWBwv6T dbname='regression' --create -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/regression.dump
[09:26:35.178](201.199s) ok 6 - pg_dump on source instance
# Running: pg_restore --create -j2 -d postgres /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/regression.dump
[09:35:53.079](557.901s) ok 7 - pg_restore to destination instance
# Running: pg_dump --no-sync -d port=31947 host=/tmp/HlLiWBwv6T dbname='regression' -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/src_dump.sql
# Running: pg_dump --no-sync -d port=31949 host=/tmp/HlLiWBwv6T dbname='regression' -f /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/dest_dump.sql
[09:43:18.485](445.407s) ok 8 - dump outputs from original and restored regression databases match
# Running: pg_dumpall --no-sync --dbname port=31947 host=/tmp/HlLiWBwv6T dbname='postgres' --file /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/pg_upgrade/002_pg_upgrade/data/tmp_test_lFqZ/dump1.sql
[09:43:45.895](27.410s) ok 9 - dump before running pg_upgrade

I think the time for "pg_dump on source instance" is somewhat misleading, it
includes initdb and starting the server.

But it's pretty obvious that the newly added steps cost quite a bit of time.

> > I suspect that the test will go a bit faster if log_statement weren't forced
> > on, printing that many log lines, with context, does make valgrind slower,
> > IME. But Cluster.pm forces it to on, and I suspect that putting a global
> > log_statement=false into TEMP_CONFIG would have it's own disadvantages.
>
> I'm sure we can make this change as well somehow, overridding the
> setting just 002_pg_upgrade.pl, as attached. I don't think it's
> relevant for this particular test. The log files go from 21 MB to
> 2.4 MB. It's not nothing ...

That is a nice improvement. I have to run a few errands, can check how that
affects valgrind times of a dump and restore of the regression db.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-04-04 16:02:51 Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Previous Message Tom Lane 2025-04-04 16:00:39 Re: Replace IN VALUES with ANY in WHERE clauses during optimization