Random pg_upgrade test failure on drongo

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: "'andrew(at)dunslane(dot)net'" <andrew(at)dunslane(dot)net>
Subject: Random pg_upgrade test failure on drongo
Date: 2023-11-08 08:13:06
Message-ID: TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

While tracking a buildfarm, I found that drongo failed the test pg_upgrade/003_logical_slots [1].
A strange point is that the test passed in the next iteration. Currently I'm not
sure the reason, but I will keep my eye for it and will investigate if it
happens again.

I think this failure is not related with our logical slots work, whereas it
failed 003_logical_slots.pl. More detail, please see latter part.

For more investigation, a server log during the upgrade may be needed. It will
be in the data directory so BF system will not upload them. I may need additional
information if it failed again.

# Analysis of failure

According to the output, pg_upgrade seemed to be failed while restoring objects
to new cluster[2].

As code-level anaysis, pg_upgrade command failed in exec_prog().
In the function, pg_restore tried to be executed for database "postgres".
Below is a brief call-stack. Note that pg_restore is not used for migrating
logical replication slots, it is done by pg_upgrade binary itself. Also, the
migration is done after all objects are copied, not in create_new_objects().

```
exec_prog()
parallel_exec_prog("pg_restore ... ") <-- Since -j option is not specified, it is just a wrapper
create_new_objects()
main()
```

In exec_prog(), system() system call was called but returned non-zero value.
Doc said that sytem() returns value that is returned by the command interpreter,
when input is not NULL [3]. Unfortunately, current code does not output the
return code. Also, BF system does not upload data directory for failed tests.
Therefore, I could not get more information for the investigation.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2023-11-07%2013%3A43%3A23&stg=pg_upgrade-check
[2]:
```
...
# No postmaster PID for node "oldpub"
# Running: pg_upgrade --no-sync -d C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_oldpub_data/pgdata -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -b C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -B C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -s 127.0.0.1 -p 54813 -P 54814 --copy
Performing Consistency Checks
...
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of "C:/prog/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231107T142224.580/log/pg_upgrade_dump_5.log" for
the probable cause of the failure.
Failure, exiting
[14:23:26.632](70.141s) not ok 10 - run of pg_upgrade of old cluster
[14:23:26.632](0.000s) # Failed test 'run of pg_upgrade of old cluster'
# at C:/prog/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl line 170.
### Starting node "newpub"
# Running: pg_ctl -w -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -l C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\log/003_logical_slots_newpub.log -o --cluster-name=newpub start
waiting for server to start.... done
server started
# Postmaster PID for node "newpub" is 4604
[14:23:28.398](1.766s) not ok 11 - check the slot exists on new cluster
[14:23:28.398](0.001s) # Failed test 'check the slot exists on new cluster'
# at C:/prog/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl line 176.
[14:23:28.399](0.000s) # got: ''
# expected: 'regress_sub|t'
...
```
[3]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/system-wsystem?view=msvc-170

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-11-08 08:31:08 Re: Fix use of openssl.path() if openssl isn't found
Previous Message Peter Eisentraut 2023-11-08 07:55:32 Re: pgsql: Clean up role created in new subscription test.