A failure in t/038_save_logical_slots_shutdown.pl

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: A failure in t/038_save_logical_slots_shutdown.pl
Date: 2024-01-10 08:38:18
Message-ID: CALj2ACVLzH5CN-h9=S26mdRHPuJ9yDLUw70yh4JOiPw03WL0CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been observing a failure in t/038_save_logical_slots_shutdown.pl
of late on my developer system:

t/038_save_logical_slots_shutdown.pl .. 1/?
# Failed test 'Check that the slot's confirmed_flush LSN is the same
as the latest_checkpoint location'
# at t/038_save_logical_slots_shutdown.pl line 35.
# Looks like you failed 1 test of 2.
t/038_save_logical_slots_shutdown.pl .. Dubious, test returned 1
(wstat 256, 0x100)
Failed 1/2 subtests

I did a quick analysis of the failure and commit
https://github.com/postgres/postgres/commit/e0b2eed047df9045664da6f724cb42c10f8b12f0
that introduced this test. I think the issue is that the slot's
confirmed_flush LSN (0/1508000) and shutdown checkpoint LSN
(0/1508018) are not the same:

tmp_check/log/038_save_logical_slots_shutdown_pub.log:

2024-01-10 07:55:44.539 UTC [57621] sub LOG: starting logical
decoding for slot "sub"
2024-01-10 07:55:44.539 UTC [57621] sub DETAIL: Streaming
transactions committing after 0/1508000, reading WAL from 0/1507FC8.
2024-01-10 07:55:44.539 UTC [57621] sub STATEMENT: START_REPLICATION
SLOT "sub" LOGICAL 0/0 (proto_version '4', origin 'any',
publication_names '"pub"')

ubuntu:~/postgres$ pg17/bin/pg_controldata -D
src/test/recovery/tmp_check/t_038_save_logical_slots_shutdown_pub_data/pgdata/
Database cluster state: in production
pg_control last modified: Wed Jan 10 07:55:44 2024
Latest checkpoint location: 0/1508018
Latest checkpoint's REDO location: 0/1508018

But the tests added by t/038_save_logical_slots_shutdown.pl expects
both LSNs to be same:

sub compare_confirmed_flush
{
# Is it same as the value read from log?
ok( $latest_checkpoint eq $confirmed_flush_from_log,
"Check that the slot's confirmed_flush LSN is the same as the
latest_checkpoint location"
);

I suspect that it's quite not right to expect the slot's
confirmed_flush and latest checkpoint location to be same in the test.
This is because the shutdown checkpoint gets an LSN that's greater
than the slot's confirmed_flush LSN - see the shutdown checkpoint
record getting inserted into WAL after the slot is marked dirty in
CheckPointReplicationSlots().

With this analysis in mind, I think the tests need to do something
like the following:

diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
b/src/test/recovery/t/038_save_logical_slots_shut
down.pl
index 5a4f5dc1d4..d49e6014fc 100644
--- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -32,7 +32,7 @@ sub compare_confirmed_flush
unless defined($latest_checkpoint);

# Is it same as the value read from log?
- ok( $latest_checkpoint eq $confirmed_flush_from_log,
+ ok( $latest_checkpoint ge $confirmed_flush_from_log,
"Check that the slot's confirmed_flush LSN is the same
as the latest_checkpoint location"
);

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-01-10 09:15:28 Re: Track in pg_replication_slots the reason why slots conflict?
Previous Message Daniel Gustafsson 2024-01-10 08:37:17 Re: Commitfest 2024-01 first week update