Re: Replication is stuck

From: Ninad Shah <ninad(dot)shah(at)percona(dot)com>
To: Murthy Nunna <mnunna(at)fnal(dot)gov>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Replication is stuck
Date: 2024-06-23 12:38:10
Message-ID: CAMtEjOZx_VaekDK_f8BW8Pj4E-VbAJ5MC0TTqWKoRsMBsEmVjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Your WAL file is corrupted. It's not possible to restore.

Thanks,

--

<https://www.percona.com/>

Ninad Shah
PostgreSQL DBA I, Managed Services

e: ninad(dot)shah(at)percona(dot)com

w: www.percona.com

Databases Run Better With Percona

On Sun, Jun 23, 2024 at 6:04 PM Murthy Nunna <mnunna(at)fnal(dot)gov> wrote:

> Thanks, Ninad. Looks like there is some error in 0000000100013D94000000FF.
> Any way to tell if this is logical corruption or physical corruption. In
> other words if this is file system corruption or of postgres generated
> corrupted file?
>
>
>
> pg_waldump -q 0000000100013D94000000FE
>
> [no errors]
>
>
>
> pg_waldump -q 0000000100013D94000000FF
>
> pg_waldump: fatal: error in WAL record at 13D94/FFBFFF48: invalid magic
> number 0000 in log segment 0000000100013D94000000FF, offset 12582912
>
>
>
> pg_waldump -q 0000000100013D9500000000
>
> [no errors]
>
>
>
>
>
> *From:* Ninad Shah <ninad(dot)shah(at)percona(dot)com>
> *Sent:* Sunday, June 23, 2024 7:16 AM
> *To:* Murthy Nunna <mnunna(at)fnal(dot)gov>
> *Cc:* pgsql-admin(at)postgresql(dot)org
> *Subject:* Re: Replication is stuck
>
>
>
> [EXTERNAL] – This message is from an external sender
>
> Hi Murthy,
>
>
>
> Would you please generate a pg_waldump of
> 0000000100013D94000000FF, 0000000100013D94000000FE
> and 0000000100013D9500000000?
>
>
> Thanks,
>
> --
>
>
> <https://url.avanan.click/v2/___https://urldefense.proofpoint.com/v2/url?u=https-3A__www.percona.com_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=0wrsmPzpZSao0v32yCcG2Q&m=ecCKPsrlqXhP4KD3hUeWPBTurZiQlLl9oQoC-n-dp0WPSHX43g3vigA7zUihduEs&s=IfF_hkFcq3eLZtf-d57h4IZIsNLPi3-U1FWhw8swAYo&e=___.YXAzOnBlcmNvbmE6YTpnOjlkNDg0MTJlMTJmYjI4NDMzZjE4OWMwOWE0MGE2ZjAwOjY6ZTQzYzo0NTgyYWQ2ZjE0NTlkMTE1NzBmNDY3NDkyOGNjMDBhMWE1NWMzYTgxZDcxMjNmZWZkZDE5NjM2ZjE1NWE4NDMyOmg6VDpO>
>
> *Ninad Shah*
> PostgreSQL DBA I, *Managed Services*
>
> *e:* ninad(dot)shah(at)percona(dot)com
>
> *w:* www.percona.com
> <https://url.avanan.click/v2/___https://urldefense.proofpoint.com/v2/url?u=http-3A__www.percona.com_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=0wrsmPzpZSao0v32yCcG2Q&m=ecCKPsrlqXhP4KD3hUeWPBTurZiQlLl9oQoC-n-dp0WPSHX43g3vigA7zUihduEs&s=-Tt8ZSQ-FSq4MnbtYk952u0RGlsZB22Hni3n3v3xaTM&e=___.YXAzOnBlcmNvbmE6YTpnOjlkNDg0MTJlMTJmYjI4NDMzZjE4OWMwOWE0MGE2ZjAwOjY6OGMzNDoyMGMzNGM4ZTdmNTA5YTVjZDFhODY5ZTFhMmYwODZhZDE1ZmQ4NjNjYjJkYWE3NTI3N2UyMzM3MTIzNTcxMDJlOmg6VDpO>
>
> *Databases Run Better With Percona*
>
>
>
>
>
> On Sun, Jun 23, 2024 at 5:32 PM Murthy Nunna <mnunna(at)fnal(dot)gov> wrote:
>
> I am running pg14.4. I use WAL replication in a stand-by server which is
> 7-days behind primary (recovery_min_apply_delay = 7d)
>
>
>
> My replication is stuck. It looks like it is repeatedly applying same WAL
> file. The next WAL file(s) are very much there.
>
>
>
> I restarted cluster but it didn’t fix the issue.
>
>
>
> I appreciate any help you can provide before I rebuild the stand-by. I am
> trying to find the root cause. If 0000000100013D94000000FF is corrupted how
> can we tell?
>
>
>
> 2024-06-23 06:54:57 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:02 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:07 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:12 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:17 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:22 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:27 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:32 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:37 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
> 2024-06-23 06:55:42 CDT []LOG: restored log file
> "0000000100013D94000000FF" from archive
>
>
>
>
>
> There are no missing WALs:
>
>
>
> ls -ltr 0000000100013D95000000* |more
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:39
> 0000000100013D9500000000
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:39
> 0000000100013D9500000001
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:39
> 0000000100013D9500000002
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:39
> 0000000100013D9500000003
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:40
> 0000000100013D9500000004
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:40
> 0000000100013D9500000005
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:40
> 0000000100013D9500000006
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:40
> 0000000100013D9500000007
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:40
> 0000000100013D9500000008
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:40
> 0000000100013D9500000009
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:41
> 0000000100013D950000000A
>
> -rw------- 1 postgres postgres 16777216 Jun 14 19:41
> 0000000100013D950000000B
>
>
>
>
>
>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message lennam 2024-06-23 12:38:57 RE: Monitoring Script for Postgres
Previous Message Murthy Nunna 2024-06-23 12:34:08 RE: Replication is stuck