Re: xid_wraparound tests intermittent failure.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <msawada(at)postgresql(dot)org>
Subject: Re: xid_wraparound tests intermittent failure.
Date: 2024-07-21 21:36:48
Message-ID: 732722.1721597808@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 2024-07-21 Su 1:34 PM, Tom Lane wrote:
>> Locally, I've not managed to reproduce the failure yet; so perhaps
>> there is some platform dependency. What are you testing on?

> Linux ub22arm 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:08:40 UTC
> 2024 aarch64 aarch64 aarch64 GNU/Linux
> It's a VM running on UTM/Apple Silicon

Hmm, doesn't sound like that ought to be slow.

I did manage to reproduce dodo's failures by running xid_wraparound
manually on mamba's very slow host:

$ time make -s installcheck PROVE_FLAGS=--timer
# +++ tap install-check in src/test/modules/xid_wraparound +++
[13:37:49] t/001_emergency_vacuum.pl .. 1/? # poll_query_until timed out executing this query:
#
# SELECT NOT EXISTS (
# SELECT *
# FROM pg_database
# WHERE age(datfrozenxid) > current_setting('autovacuum_freeze_max_age')::int)
#
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 4 just after 1.
[13:37:49] t/001_emergency_vacuum.pl .. Dubious, test returned 4 (wstat 1024, 0x400)
All 1 subtests passed
[14:06:51] t/002_limits.pl ............ 2/? # Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 2.
[14:06:51] t/002_limits.pl ............ Dubious, test returned 29 (wstat 7424, 0x1d00)
All 2 subtests passed
[14:31:16] t/003_wraparounds.pl ....... ok 7564763 ms ( 0.00 usr 0.01 sys + 13.82 cusr 9.26 csys = 23.09 CPU)
[16:37:21]

Test Summary Report
-------------------
t/001_emergency_vacuum.pl (Wstat: 1024 (exited 4) Tests: 1 Failed: 0)
Non-zero exit status: 4
Parse errors: No plan found in TAP output
t/002_limits.pl (Wstat: 7424 (exited 29) Tests: 2 Failed: 0)
Non-zero exit status: 29
Parse errors: No plan found in TAP output
Files=3, Tests=4, 10772 wallclock secs ( 0.15 usr 0.06 sys + 58.50 cusr 59.88 csys = 118.59 CPU)
Result: FAIL
make: *** [../../../../src/makefiles/pgxs.mk:442: installcheck] Error 1
10772.99 real 59.34 user 60.14 sys

Each of those two failures looks just like something that dodo has
shown at one time or another. So it's at least plausible that
"slow machine" is the whole explanation. I'm still wondering
though if there's some effect that causes the test's runtime to
be unstable in itself, sometimes leading to timeouts.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2024-07-21 22:53:51 Re: POC, WIP: OR-clause support for indexes
Previous Message Thomas Munro 2024-07-21 21:26:32 Re: CI, macports, darwin version problems