LOG: invalid record length at <LSN> : wanted 24, got 0

From: Harinath Kanchu <hkanchu(at)apple(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: LOG: invalid record length at <LSN> : wanted 24, got 0
Date: 2023-03-01 05:21:12
Message-ID: 47509690-AC33-4C8D-8566-D1B9BF662B34@apple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

We are seeing an interesting STANDBY behavior, that’s happening once in 3-4 days.

The standby suddenly disconnects from the primary, and it throws the error “LOG: invalid record length at <LSN>: wanted 24, got0”.

And then it tries to restore the WAL file from the archive. Due to low write activity on primary, the WAL file will be switched and archived only after 1 hr.

So, it stuck in a loop of switching the WAL sources from STREAM and ARCHIVE without replicating the primary.

Due to this there will be write outage as the standby is synchronous standby.

We are using “wal_sync_method” as “fsync” assuming WAL file not getting flushed correctly.

But this is happening even after making it as “fsync” instead of “fdatasync”.

Restarting the STANDBY sometimes fixes this problem, but detecting this automatically is a big problem as the postgres standby process will be still running fine, but WAL RECEIVER process is up and down continuously due to switching of WAL sources.

How can we fix this ? Any suggestions regarding this will be appreciated.

Postgres Version: 13.6
OS: RHEL Linux

Thank you,

Best,
Harinath.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-03-01 05:26:48 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Zheng Li 2023-03-01 05:19:50 Re: Support logical replication of global object commands