The 026_overwrite_contrecord test might fail on extremely slow animals

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: The 026_overwrite_contrecord test might fail on extremely slow animals
Date: 2025-03-30 09:00:00
Message-ID: 9ffdb19a-7a89-424e-925a-dd981c37f0ba@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

A couple of buildfarm failures produced by skink, which runs
tests under Valgrind: [1], [2], with the following diagnostics:
 31/324 postgresql:recovery / recovery/026_overwrite_contrecord ERROR            325.72s   (exit status 255 or signal
127 SIGinvalid)

# Initializing node "standby" from backup "backup" of node "primary"
### Enabling streaming replication for node "standby"
### Starting node "standby"
# Running: pg_ctl -w -D
/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/data/t_026_overwrite_contrecord_standby_data/pgdata
-l
/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/log/026_overwrite_contrecord_standby.log
-o --cluster-name=standby start
waiting for server to start........ stopped waiting
pg_ctl: could not start server
Examine the log output.
# pg_ctl start failed; see logfile for details:
/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/026_overwrite_contrecord/log/026_overwrite_contrecord_standby.log
# No postmaster PID for node "standby"
[14:44:39.051](7.936s) Bail out!  pg_ctl start failed

026_overwrite_contrecord_standby.log:
2025-03-14 14:44:38.533 UTC [1558222][startup][:0] LOG:  database system was interrupted; last known up at 2025-03-14
14:44:30 UTC
2025-03-14 14:44:38.806 UTC [1558222][startup][:0] LOG:  invalid checkpoint record
2025-03-14 14:44:38.808 UTC [1558222][startup][:0] PANIC:  could not locate a valid checkpoint record at 0/2094248
2025-03-14 14:44:38.937 UTC [1555866][postmaster][:0] LOG:  startup process (PID 1558222) was terminated by signal 6:
Aborted
2025-03-14 14:44:38.937 UTC [1555866][postmaster][:0] LOG:  aborting startup due to startup process failure

026_overwrite_contrecord_primary.log:
2025-03-14 14:44:30.536 UTC [1553014][client backend][4/2:0] LOG: statement: SELECT
pg_walfile_name(pg_current_wal_insert_lsn())
2025-03-14 14:44:30.621 UTC [1519069][checkpointer][:0] LOG: checkpoint complete: wrote 109 buffers (85.2%), wrote 3
SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=12.404 s, sync=0.013 s, total=12.574 s; sync files=28,
longest=0.004 s, average=0.001 s; distance=8299 kB, estimate=8299 kB; lsn=0/2094248, redo lsn=0/1FA5A48
2025-03-14 14:44:31.119 UTC [1519026][postmaster][:0] LOG:  received immediate shutdown request

indicates that this test might fail when a checkpoint performed during the
test execution (that is, when the duration of the test up to
$node->stop('immediate');
exceeds 300 secs).

This failure can be easily reproduced with:
--- a/src/test/recovery/t/026_overwrite_contrecord.pl
+++ b/src/test/recovery/t/026_overwrite_contrecord.pl
@@ -24,2 +24,3 @@ autovacuum = off
 wal_keep_size = 1GB
+checkpoint_timeout=30s
 ));
@@ -61,2 +62,3 @@ $node->safe_psql('postgres',
 #$node->safe_psql('postgres', qq{create table foo ()});
+sleep(40);
 my $endfile = $node->safe_psql('postgres',

The last 5 runs of the test on skink on master show the following duration of the test:
 29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 317.37s   3 subtests passed
 29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 412.78s   3 subtests passed
 29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 448.46s   3 subtests passed
 29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 445.73s   3 subtests passed
 29/331 postgresql:recovery / recovery/026_overwrite_contrecord  OK 383.50s   3 subtests passed

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-09%2020%3A23%3A09 - REL_17_STABLE
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-14%2013%3A52%3A09 - master

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2025-03-30 10:14:01 Re: Options to control remote transactions’ access/deferrable modes in postgres_fdw
Previous Message Gurjeet Singh 2025-03-30 06:46:49 Minor edits to README.tuplock, and a question