Re: Streaming replica failure

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Streaming replica failure
Date: 2021-04-30 11:03:50
Message-ID: CAM+6J94qefLYTuAy7uki7vs-LZ+c3eGi6ccimfGPZ8OsOUH1Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Were there any issues with hardware ?
Memory/storage ?
I am not sure but it look like data loss to me.

If you have admin access?

Can you run dmesg -a, system logs, hyperion logs etc and see if there are
any errors related to memory corruption.
Coz if there are hardware issues, this will happen again IMHO.
But if this is not a hardware issue, then I guess one of the experts may
have to chime in.

Thanks,
Vijay

On Tue, Apr 27, 2021, 3:31 PM Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com> wrote:

> Hello,
>
> we are using PostgreSQL 12.4 on CentOS 7. The hot standby failed:
>
> 2021-04-24 09:19:27 CEST [20956]: [747-1] user=,db=,host=,app= LOG:
> recovery restart point at 3D8C/352B4CE8
> 2021-04-24 09:19:27 CEST [20956]: [748-1] user=,db=,host=,app= DETAIL:
> Last completed transaction was at log time 2021-04-24 09:19:27.221313+02.
> 2021-04-24 09:20:57 CEST [20956]: [749-1] user=,db=,host=,app= LOG:
> restartpoint starting: time
> 2021-04-24 09:24:27 CEST [20956]: [750-1] user=,db=,host=,app= LOG:
> restartpoint complete: wrote 171233 buffers (13.4%); 0 WAL file(s) added,
> 68 removed, 0 recycled; write
> =209.663 s, sync=0.012 s, total=209.963 s; sync files=283, longest=0.001
> s, average=0.000 s; distance=684762 kB, estimate=684762 kB
> 2021-04-24 09:24:27 CEST [20956]: [751-1] user=,db=,host=,app= LOG:
> recovery restart point at 3D8C/5EF6B858
> 2021-04-24 09:24:27 CEST [20956]: [752-1] user=,db=,host=,app= DETAIL:
> Last completed transaction was at log time 2021-04-24 09:24:27.288115+02.
> 2021-04-24 09:25:15 CEST [20955]: [11-1] user=,db=,host=,app= WARNING:
> page 366603 of relation base/20955/10143636 is uninitialized
> 2021-04-24 09:25:15 CEST [20955]: [12-1] user=,db=,host=,app= CONTEXT:
> WAL redo at 3D8C/D79F6500 for Heap2/VISIBLE: cutoff xid 3806260577 flags
> 0x01
> 2021-04-24 09:25:15 CEST [20955]: [13-1] user=,db=,host=,app= PANIC: WAL
> contains references to invalid pages
> 2021-04-24 09:25:15 CEST [20955]: [14-1] user=,db=,host=,app= CONTEXT:
> WAL redo at 3D8C/D79F6500 for Heap2/VISIBLE: cutoff xid 3806260577 flags
> 0x01
> 2021-04-24 09:25:16 CEST [20953]: [11-1] user=,db=,host=,app= LOG:
> startup process (PID 20955) was terminated by signal 6: Neúspěšně ukončen
> (SIGABRT)
> 2021-04-24 09:25:16 CEST [20953]: [12-1] user=,db=,host=,app= LOG:
> terminating any other active server processes
>
> The relation base/20955/10143636 is a standard table.
>
> Version details:
> postgres=# select version();
> version
>
>
> ---------------------------------------------------------------------------------------------------------
> PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-39), 64-bit
> (1 row)
>
> Settings:
> postgres=# select name, setting, category from pg_settings where category
> = 'Write-Ahead Log / Settings';
> name | setting | category
> ------------------------+-----------+----------------------------
> commit_delay | 0 | Write-Ahead Log / Settings
> commit_siblings | 5 | Write-Ahead Log / Settings
> fsync | on | Write-Ahead Log / Settings
> full_page_writes | on | Write-Ahead Log / Settings
> synchronous_commit | on | Write-Ahead Log / Settings
> wal_buffers | 2048 | Write-Ahead Log / Settings
> wal_compression | off | Write-Ahead Log / Settings
> wal_init_zero | on | Write-Ahead Log / Settings
> wal_level | logical | Write-Ahead Log / Settings
> wal_log_hints | off | Write-Ahead Log / Settings
> wal_recycle | on | Write-Ahead Log / Settings
> wal_sync_method | fdatasync | Write-Ahead Log / Settings
> wal_writer_delay | 200 | Write-Ahead Log / Settings
> wal_writer_flush_after | 128 | Write-Ahead Log / Settings
> (14 rows)
>
> pg_waldump output:
> -bash-4.2$ /usr/pgsql-12/bin/pg_waldump 0000000100003D8C000000D7 2>&1 |
> tail
> rmgr: Btree len (rec/tot): 64/ 64, tx: 3812802559, lsn:
> 3D8C/D7CF5A98, prev 3D8C/D7CF5A58, desc: INSERT_LEAF off 360, blkref #0:
> rel 1663/20955/11280092 blk 19
> 509
> rmgr: Transaction len (rec/tot): 46/ 46, tx: 3812802557, lsn:
> 3D8C/D7CF5AD8, prev 3D8C/D7CF5A98, desc: COMMIT 2021-04-24 09:25:16.160687
> CEST
> rmgr: Heap len (rec/tot): 159/ 159, tx: 3812802559, lsn:
> 3D8C/D7CF5B08, prev 3D8C/D7CF5AD8, desc: INSERT off 8 flags 0x08, blkref
> #0: rel 1663/20955/11280066 bl
> k 165603
> rmgr: Btree len (rec/tot): 64/ 64, tx: 3812802559, lsn:
> 3D8C/D7CF5BA8, prev 3D8C/D7CF5B08, desc: INSERT_LEAF off 317, blkref #0:
> rel 1663/20955/11280073 blk 15
> 340
> rmgr: Btree len (rec/tot): 64/ 64, tx: 3812802559, lsn:
> 3D8C/D7CF5BE8, prev 3D8C/D7CF5BA8, desc: INSERT_LEAF off 130, blkref #0:
> rel 1663/20955/11280091 blk 22
> 003
> rmgr: Heap len (rec/tot): 80/ 80, tx: 3812802559, lsn:
> 3D8C/D7CF5C28, prev 3D8C/D7CF5BE8, desc: INSERT off 61 flags 0x08, blkref
> #0: rel 1663/20955/11280082 b
> lk 38917
> rmgr: Btree len (rec/tot): 64/ 64, tx: 3812802559, lsn:
> 3D8C/D7CF5C78, prev 3D8C/D7CF5C28, desc: INSERT_LEAF off 73, blkref #0: rel
> 1663/20955/11280088 blk 421
> 25
> rmgr: Btree len (rec/tot): 64/ 64, tx: 3812802559, lsn:
> 3D8C/D7CF5CB8, prev 3D8C/D7CF5C78, desc: INSERT_LEAF off 361, blkref #0:
> rel 1663/20955/11280092 blk 19
> 509
> rmgr: Transaction len (rec/tot): 46/ 46, tx: 3812802559, lsn:
> 3D8C/D7CF5CF8, prev 3D8C/D7CF5CB8, desc: COMMIT 2021-04-24 09:25:16.161036
> CEST
> pg_waldump: fatal: error in WAL record at 3D8C/D7CF5CF8: invalid record
> length at 3D8C/D7CF5D28: wanted 24, got 0
> -bash-4.2$
>
> Emails like "BUG #13822: Slave terminated - WAL contains references to
> invalid page" in the archive are quite old (2015).
>
> While we can create a new replica, ios there something we can do to
> diagnose the root cause of this failure?
>
> Thanks for any hints or advice.
>
> Kind regards Ales Zeleny
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Munro 2021-04-30 11:11:45 Re: -1/0 virtualtransaction
Previous Message Laurenz Albe 2021-04-30 11:00:17 Re: Access a newer Version of PGDB (v13) with an older libpq (v10 x86)