From: | Ben Chobot <bench(at)silentmedia(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: 12.3 replicas falling over during WAL redo |
Date: | 2020-08-03 22:42:06 |
Message-ID: | 9ba2cbd8-fa0b-cdd9-3eea-26b5418f20ce@silentmedia.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Alvaro Herrera wrote on 8/3/20 2:34 PM:
> On 2020-Aug-03, Ben Chobot wrote:
>
>> Alvaro Herrera wrote on 8/3/20 12:34 PM:
>>> On 2020-Aug-03, Ben Chobot wrote:
>>>
>>> Yep. Looking at the ones in block 6501,
>>>
>>>> rmgr: Btree len (rec/tot): 72/ 72, tx: 76393394, lsn:
>>>> A0A/AB2C43D0, prev A0A/AB2C4378, desc: INSERT_LEAF off 41, blkref #0: rel
>>>> 16605/16613/60529051 blk 6501
>>>> rmgr: Btree len (rec/tot): 72/ 72, tx: 76396065, lsn:
>>>> A0A/AC4204A0, prev A0A/AC420450, desc: INSERT_LEAF off 48, blkref #0: rel
>>>> 16605/16613/60529051 blk 6501
>>> My question was whether the block has received the update that added the
>>> item in offset 41; that is, is the LSN in the crashed copy of the page
>>> equal to A0A/AB2C43D0? If it's an older value, then the write above was
>>> lost for some reason.
>> How do I tell?
> You can use pageinspect's page_header() function to obtain the page's
> LSN. You can use dd to obtain the page from the file,
>
> dd if=16605/16613/60529051 bs=8192 count=1 seek=6501 of=/tmp/page.6501
If I use skip instead of seek....
> then put that binary file in a bytea column, perhaps like
>
> create table page (raw bytea);
> insert into page select pg_read_binary_file('/tmp/page');
>
> and with that you can run page_header:
>
> create extension pageinspect;
> select h.* from page, page_header(raw) h;
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
--------------+----------+-------+-------+-------+---------+----------+---------+-----------
A0A/99BA11F8 | -215 | 0 | 180 | 7240 | 8176 | 8192
| 4 | 0
As I understand what we're looking at, this means the WAL stream was
assuming this page was last touched by A0A/AB2C43D0, but the page itself
thinks it was last touched by A0A/99BA11F8, which means at least one
write to the page is missing?
From | Date | Subject | |
---|---|---|---|
Next Message | John Ashmead | 2020-08-03 22:55:27 | Re: How can you find out what point logical replication is at? -- or weird, slow, infinite loop |
Previous Message | Ben Chobot | 2020-08-03 22:10:49 | Re: 12.3 replicas falling over during WAL redo |