From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | A failure in t/038_save_logical_slots_shutdown.pl |
Date: | 2024-01-10 08:38:18 |
Message-ID: | CALj2ACVLzH5CN-h9=S26mdRHPuJ9yDLUw70yh4JOiPw03WL0CQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I've been observing a failure in t/038_save_logical_slots_shutdown.pl
of late on my developer system:
t/038_save_logical_slots_shutdown.pl .. 1/?
# Failed test 'Check that the slot's confirmed_flush LSN is the same
as the latest_checkpoint location'
# at t/038_save_logical_slots_shutdown.pl line 35.
# Looks like you failed 1 test of 2.
t/038_save_logical_slots_shutdown.pl .. Dubious, test returned 1
(wstat 256, 0x100)
Failed 1/2 subtests
I did a quick analysis of the failure and commit
https://github.com/postgres/postgres/commit/e0b2eed047df9045664da6f724cb42c10f8b12f0
that introduced this test. I think the issue is that the slot's
confirmed_flush LSN (0/1508000) and shutdown checkpoint LSN
(0/1508018) are not the same:
tmp_check/log/038_save_logical_slots_shutdown_pub.log:
2024-01-10 07:55:44.539 UTC [57621] sub LOG: starting logical
decoding for slot "sub"
2024-01-10 07:55:44.539 UTC [57621] sub DETAIL: Streaming
transactions committing after 0/1508000, reading WAL from 0/1507FC8.
2024-01-10 07:55:44.539 UTC [57621] sub STATEMENT: START_REPLICATION
SLOT "sub" LOGICAL 0/0 (proto_version '4', origin 'any',
publication_names '"pub"')
ubuntu:~/postgres$ pg17/bin/pg_controldata -D
src/test/recovery/tmp_check/t_038_save_logical_slots_shutdown_pub_data/pgdata/
Database cluster state: in production
pg_control last modified: Wed Jan 10 07:55:44 2024
Latest checkpoint location: 0/1508018
Latest checkpoint's REDO location: 0/1508018
But the tests added by t/038_save_logical_slots_shutdown.pl expects
both LSNs to be same:
sub compare_confirmed_flush
{
# Is it same as the value read from log?
ok( $latest_checkpoint eq $confirmed_flush_from_log,
"Check that the slot's confirmed_flush LSN is the same as the
latest_checkpoint location"
);
I suspect that it's quite not right to expect the slot's
confirmed_flush and latest checkpoint location to be same in the test.
This is because the shutdown checkpoint gets an LSN that's greater
than the slot's confirmed_flush LSN - see the shutdown checkpoint
record getting inserted into WAL after the slot is marked dirty in
CheckPointReplicationSlots().
With this analysis in mind, I think the tests need to do something
like the following:
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
b/src/test/recovery/t/038_save_logical_slots_shut
down.pl
index 5a4f5dc1d4..d49e6014fc 100644
--- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -32,7 +32,7 @@ sub compare_confirmed_flush
unless defined($latest_checkpoint);
# Is it same as the value read from log?
- ok( $latest_checkpoint eq $confirmed_flush_from_log,
+ ok( $latest_checkpoint ge $confirmed_flush_from_log,
"Check that the slot's confirmed_flush LSN is the same
as the latest_checkpoint location"
);
Thoughts?
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2024-01-10 09:15:28 | Re: Track in pg_replication_slots the reason why slots conflict? |
Previous Message | Daniel Gustafsson | 2024-01-10 08:37:17 | Re: Commitfest 2024-01 first week update |