Quick Links

Re: Test to dump and restore objects left behind by regression

From:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Test to dump and restore objects left behind by regression
Date:	2025-03-28 12:27:50
Message-ID:	CAExHW5teUDXYR+DyoTP=NJw_=gUy1g=bCmVbKP5+UhRW=Nm0qw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 28, 2025 at 12:20 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Fri, Mar 28, 2025 at 7:07 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Thu, Mar 27, 2025 at 06:15:06PM +0100, Alvaro Herrera wrote:
> > > BTW another idea to shorten this tests's runtime might be to try and
> > > identify which of parallel_schedule tests leave objects behind and
> > > create a shorter schedule with only those (a possible implementation
> > > might keep a list of the slow tests that don't leave any useful object
> > > behind, then filter parallel_schedule to exclude those; this ensures
> > > test files created in the future are still used.)
> >
> > I'm not much a fan of approaches that require an extra schedule,
> > because this is prone to forget the addition of objects that we'd want
> > to cover for the scope of this thread with the dump/restore
> > inter-dependencies, failing our goal of having more coverage. And
> > history has proven that we are quite bad at maintaining multiple
> > schedules for the regression test suite (remember the serial one or
> > the standby one in pg_regress?). So we should really do things so as
> > the schedules are down to a strict minimum: 1.
>
> I see Alvaro's point about using a different and minimal schedule. We
> already have 002_pg_upgrade and 027_stream_ as candidates which could
> use schedules other than default and avoid wasting CPU cycles.
> But I also agree with your opinion that maintaining multiple schedules
> is painful and prone to errors.
>
> What we could do is to create the schedule files automatically during
> build. The automation script will require to know which file to place
> in which schedules. That information could be either part of the sql
> file itself or could be in a separate text file. For example, every
> SQL file has the following line listing all the schedules that this
> SQL file should be part of. E.g.
>
> -- schedules: parallel, serial, upgrade
>
> The automated script looks at every .sql file in a given sql directory
> and creates the schedule files containing all the SQL files which had
> respective schedules mentioned in their "schedule" annotation. The
> automation script would flag SQL files that do not have scheduled
> annotation so any new file added won't be missed. However, we will
> still miss a SQL file if it wasn't part of a given schedule and later
> acquired some changes which required it to be added to a new schedule.
>
> If we go this route, we could make 'make check-tests' better. We could
> add another annotation for depends listing all the SQL files that a
> given SQL file depends upon. make check-tests would collect all
> dependencies, sort them and run all the dependencies as well.
>
> Of course that's out of scope for this patch. We don't have time left
> for this in PG 18.

I spent several hours today examining each SQL file to decide whether
or not it has "interesting" objects that it leaves behind for
dump/restore test. I came up with attached schedule - which may not be
accurate since I it would require much more time to examine all tests
to get an accurate schedule. But what I have got may be close enough.
With that we could save about 6 seconds on my laptop. If we further
compact the schedule reorganizing the parallel groups we may shave
some more seconds.

no modifications to parallel schedule
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.84s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.80s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
41.37s 28 subtests passed

with attached modified parallel schedule
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
36.13s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
35.86s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
36.33s 28 subtests passed
1/1 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade OK
36.02s 28 subtests passed

However, it's a very painful process to come up with the schedule and
more painful and error prone to maintain it. It could take many days
to come up with the right schedule which can become inaccurate the
moment next SQL file is added OR an existing file is modified to
add/drop "interesting" objects.

--
Best Wishes,
Ashutosh Bapat

Attachment	Content-Type	Size
parallel_schedule_dump_restore	application/octet-stream	4.6 KB

In response to

Re: Test to dump and restore objects left behind by regression at 2025-03-28 06:50:29 from Ashutosh Bapat

Responses

Re: Test to dump and restore objects left behind by regression at 2025-03-28 14:11:15 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alena Rybakina	2025-03-28 12:31:31	Re: POC, WIP: OR-clause support for indexes
Previous Message	Alena Rybakina	2025-03-28 12:23:24	Re: POC, WIP: OR-clause support for indexes