Re: optimize file transfer in pg_upgrade

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, bruce(at)momjian(dot)us
Subject: Re: optimize file transfer in pg_upgrade
Date: 2025-04-28 15:15:05
Message-ID: aA-beRzxOK-GhvaD@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 27, 2025 at 05:00:01PM +0300, Alexander Lakhin wrote:
> Both happened on Windows, but what's worse is that the failure logs
> contain no information on the exact reason. We can see:
> #   Failed test 'pg_upgrade with transfer mode --swap: stdout matches'
> #   at C:/tools/xmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/006_transfer_modes.pl line 61.
> ...
> # Restoring database schemas in the new cluster
> # *failure*

I see a couple of other pg_upgrade failures on drongo and fairywren that
look similar, although these are for different tests:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-03-10%2019%3A26%3A35
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-03-30%2013%3A03%3A05

> Moreover, even when pg_upgrade succeeds, IPC::Run::run inside
> command_ok_or_fails_like() returns false, as we can see from a
> successful test run:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=fairywren&dt=2025-04-27%2001%3A03%3A06&stg=misc-check
>
> pgsql.build/testrun/pg_upgrade/006_transfer_modes/log/regress_log_006_transfer_modes
> [01:18:38.210](21.036s) ok 1 - pg_upgrade with transfer mode --clone: stdout matches
> [01:18:38.211](0.001s) ok 2 - pg_upgrade with transfer mode --clone: stderr matches

That's expected for platforms that don't support all of the modes. We
verify the output matches a known error message in that case.

> So maybe it's worth to adjust the test somehow to have interesting logs
> left after a failure?

I see some other discussion about failures with similar symptoms [0] [1].
Commit 6f97ef0 seems to have helped with one of the tests, and there is a
proposed patch in the latest thread [2] that AFAICT aims to fix the
underlying issue.

[0] https://postgr.es/m/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[1] https://postgr.es/m/CALDaNm3tjY44HoSwY84%3DXGEbTg0ruVfD4hAMTm%3DTgBqVysH4Qw%40mail.gmail.com
[2] https://postgr.es/m/CALDaNm2y%2Bnf-V9tjKwvbPprobZs1t_UrcCpJ0qYD5-KkOUFAyg%40mail.gmail.com

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2025-04-28 15:17:12 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Previous Message Alvaro Herrera 2025-04-28 14:50:35 Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints