Re: Recent 027_streaming_regress.pl hangs

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: Recent 027_streaming_regress.pl hangs
Date: 2024-06-04 10:00:00
Message-ID: f748ee55-9e73-3f5e-e879-8865c5e9933a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Andres,

>> So it looks like the issue resolved, but there is another apparently
>> performance-related issue: deadlock-parallel test failures.
> I reduced test concurrency a bit. I hadn't quite realized how the buildfarm
> config and meson test concurrency interact. But there's still something off
> with the frequency of fsyncs during replay, but perhaps that doesn't qualify
> as a bug.

It looks like that set of animals is still suffering from extreme load.
Please take a look at the today's failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-06-04%2002%3A44%3A19

1/1 postgresql:regress-running / regress-running/regress TIMEOUT        3000.06s   killed by signal 15 SIGTERM

inst/logfile ends with:
2024-06-04 03:39:24.664 UTC [3905755][client backend][5/1787:16793] ERROR:  column "c2" of relation "test_add_column"
already exists
2024-06-04 03:39:24.664 UTC [3905755][client backend][5/1787:16793] STATEMENT:  ALTER TABLE test_add_column
        ADD COLUMN c2 integer, -- fail because c2 already exists
        ADD COLUMN c3 integer primary key;
2024-06-04 03:39:30.815 UTC [3905755][client backend][5/0:0] LOG: could not send data to client: Broken pipe
2024-06-04 03:39:30.816 UTC [3905755][client backend][5/0:0] FATAL: connection to client lost

"ALTER TABLE test_add_column" is from the alter_table test, which executed
in the group 21 out of 25.

Another similar failure:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-05-24%2002%3A22%3A26&stg=install-check-C

1/1 postgresql:regress-running / regress-running/regress TIMEOUT        3000.06s   killed by signal 15 SIGTERM

inst/logfile ends with:
2024-05-24 03:18:51.469 UTC [998579][client backend][7/1792:16786] ERROR:  could not change table "logged1" to unlogged
because it references logged table "logged2"
2024-05-24 03:18:51.469 UTC [998579][client backend][7/1792:16786] STATEMENT:  ALTER TABLE logged1 SET UNLOGGED;
(This is the alter_table test again.)

I've analyzed duration of the regress-running/regress test for the recent
167 runs on skink and found that the average duration is 1595 seconds, but
there were much longer test runs:
2979.39:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-05-01%2004%3A15%3A29&stg=install-check-C
2932.86:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-04-28%2018%3A57%3A37&stg=install-check-C
2881.78:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2024-05-15%2020%3A53%3A30&stg=install-check-C

So it seems that the default timeout is not large enough for these
conditions. (I've counted 10 such timeout failures of 167 test runs.)

Also, 027_stream_regress still fails due to the same reason:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-05-22%2021%3A55%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-05-22%2021%3A54%3A50
(It's remarkable that these two animals failed at the same time.)

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Long Song 2024-06-04 10:03:17 Re:Re: [PATCH]A minor improvement to the error-report in SimpleLruWriteAll()
Previous Message Peter Smith 2024-06-04 09:30:49 Re: Pgoutput not capturing the generated columns