dikkop failed the pg_combinebackupCheck/006_db_file_copy.pl test

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: dikkop failed the pg_combinebackupCheck/006_db_file_copy.pl test
Date: 2024-07-29 04:00:00
Message-ID: 877b1f23-35d2-31b2-2fcd-d176fd3d05c4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Tomas,

Please take a look at a recent dikkop's failure [1]. The
regress_log_006_db_file_copy file from that run shows:
[02:08:57.929](0.014s) # initializing database system by copying initdb template
...
[02:09:22.511](24.583s) ok 1 - full backup
...
[02:10:35.758](73.247s) not ok 2 - incremental backup

006_db_file_copy_primary.log contains:
2024-07-28 02:09:29.441 UTC [67785:12] 006_db_file_copy.pl LOG: received replication command: START_REPLICATION SLOT
"pg_basebackup_67785" 0/4000000 TIMELINE 1
2024-07-28 02:09:29.441 UTC [67785:13] 006_db_file_copy.pl STATEMENT:  START_REPLICATION SLOT "pg_basebackup_67785"
0/4000000 TIMELINE 1
2024-07-28 02:09:29.441 UTC [67785:14] 006_db_file_copy.pl LOG: acquired physical replication slot "pg_basebackup_67785"
2024-07-28 02:09:29.441 UTC [67785:15] 006_db_file_copy.pl STATEMENT:  START_REPLICATION SLOT "pg_basebackup_67785"
0/4000000 TIMELINE 1
2024-07-28 02:10:29.487 UTC [67785:16] 006_db_file_copy.pl LOG: terminating walsender process due to replication timeout
2024-07-28 02:10:29.487 UTC [67785:17] 006_db_file_copy.pl STATEMENT:  START_REPLICATION SLOT "pg_basebackup_67785"
0/4000000 TIMELINE 1

It looks like this incremental backup operation was performed slower than
usual (it took more than 60 seconds and apparently was interrupted due to
wal_sender_timeout). But looking at regress_log_006_db_file_copy from the
6 previous (successful) test runs, we can see:
[14:22:16.841](43.215s) ok 2 - incremental backup
[02:14:42.888](34.595s) ok 2 - incremental backup
[17:51:16.152](43.708s) ok 2 - incremental backup
[04:07:16.757](31.087s) ok 2 - incremental backup
[12:15:01.256](49.432s) ok 2 - incremental backup
[01:06:02.482](52.364s) ok 2 - incremental backup

Thus reaching 60s (e.g., due to some background activity) on this animal
seems pretty possible. So maybe it would make sense to increase
wal_sender_timeout for it, say, to 120s?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2024-07-27%2023%3A22%3A57

Best regards,
Alexander

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-07-29 04:01:07 Re: Conflict detection and logging in logical replication
Previous Message Hayato Kuroda (Fujitsu) 2024-07-29 03:58:20 RE: [Proposal] Add foreign-server health checks infrastructure