Logical replication stopped suddenly claiming wal_status lost when max_slot_wal_keep_size was unlimited

From: Dennis White <dwhite(at)seawardmoon(dot)com>
To: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Logical replication stopped suddenly claiming wal_status lost when max_slot_wal_keep_size was unlimited
Date: 2024-08-23 21:29:07
Message-ID: CAE=rie84d==WrWfLMxD-_8Gmavjh33rrRficpK_q0qg3==cC_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

After running continuously for perhaps a year or more, my project's logical
replication stopped on our test DB this morning claiming wal was lost due
to size limits when there aren't any limits.

The system is running Centos7 and I was planning on moving to Rhel8 and
14.12 today, but so much for that.

Is this a bug that was fixed in a later release of 14?

Is there some other setting that must be set to get the wal retained?

Here are the details:

Version:

PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-44), 64-bit

Log entries: (log entries that followed the last listed just continued to
say the slot was invalid)

2024-08-23 03:07:45.926 UTC [1121] LOG: starting logical decoding for slot
"track_subscription"

2024-08-23 03:07:45.926 UTC [1121] DETAIL: Streaming transactions
committing after AB17/4A0C9F40, reading WAL from AB17/46D98068.

2024-08-23 03:07:45.926 UTC [1121] STATEMENT: START_REPLICATION SLOT
"track_subscription" LOGICAL AB17/554088B0 (proto_version '2',
publication_names '"track_ingestion"')

2024-08-23 03:07:45.926 UTC [1121] LOG: logical decoding found consistent
point at AB17/46D98068

2024-08-23 03:07:45.926 UTC [1121] DETAIL: There are no running
transactions.

2024-08-23 03:07:45.926 UTC [1121] STATEMENT: START_REPLICATION SLOT
"track_subscription" LOGICAL AB17/554088B0 (proto_version '2',
publication_names '"track_ingestion"')

2024-08-23 03:08:17.161 UTC [48799] LOG: terminating process 1121 to
release replication slot "track_subscription"

2024-08-23 03:08:17.161 UTC [1121] FATAL: terminating connection due to
administrator command

2024-08-23 03:08:17.161 UTC [1121] CONTEXT: slot "track_subscription",
output plugin "pgoutput", in the change callback, associated LSN
AB17/663138F0

2024-08-23 03:08:17.161 UTC [1121] STATEMENT: START_REPLICATION SLOT
"track_subscription" LOGICAL AB17/554088B0 (proto_version '2',
publication_names '"track_ingestion"')

2024-08-23 03:08:17.190 UTC [1121] LOG: disconnection: session time:
0:00:33.502 user=sysrep database=trackdb
host=postgresqldb03.s2a.nrl.navy.mil.31.250.132.in-addr.arpa port=36840

2024-08-23 03:08:17.195 UTC [48799] LOG: invalidating slot
"track_subscription" because its restart_lsn AB17/4D0E3320 exceeds
max_slot_wal_keep_size

trackdb=# select * from pg_replication_slots;

slot_name | plugin | slot_type | datoid | database | temporary
| active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn | wal_status | safe_wal_size | two_phase

--------------------+----------+-----------+--------+-------
---+-----------+--------+------------+------+--------------+
-------------+---------------------+------------+---------------+-----------

track_subscription | pgoutput | logical | 16386 | trackdb | f |
f | | | 130568429 | |
AB17/554088B0 | lost | | f

(1 row)

show max_slot_wal_keep_size;

max_slot_wal_keep_size

------------------------

-1

(1 row)

Thanks,

Dennis

Browse pgsql-admin by date

  From Date Subject
Next Message Mahesh Shetty 2024-08-24 06:16:56 Re: Is index deduplication active on an index
Previous Message Craig Milhiser 2024-08-23 19:24:50 Is index deduplication active on an index