From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg15b3: recovery fails with wal prefetch enabled |
Date: | 2022-09-01 02:48:38 |
Message-ID: | 20220901024837.GD31833@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Some more details, in case they're important:
First: the server has wal_compression=zstd (I wonder if something
doesn't allow/accomodate compressed FPI?)
I thought to mention that after compiling pg15 locally and forgetting to
use --with-zstd.
I compiled it to enable your debug logging, which wrote these during
recovery:
< 2022-08-31 21:17:01.807 CDT >NOTICE: suppressing prefetch in relation 1663/16888/165958212 from block 156 until 1201/1C3965A0 is replayed, which truncates the relation
< 2022-08-31 21:17:01.903 CDT >NOTICE: suppressing prefetch in relation 1663/16888/165958523 from block 23 until 1201/1C39CC98 is replayed, which truncates the relation
< 2022-08-31 21:17:02.029 CDT >NOTICE: suppressing prefetch in relation 1663/16888/165958523 from block 23 until 1201/1C8643C8 is replayed, because the relation is too small
Also, pg_waldump seems to fail early with -w:
[pryzbyj(at)template0 ~]$ sudo /usr/pgsql-15/bin/pg_waldump -w -R 1663/16881/2840 -F vm -p /mnt/tmp/15/data/pg_wal 00000001000012010000001C
rmgr: Heap2 len (rec/tot): 64/ 122, tx: 0, lsn: 1201/1CAF2658, prev 1201/1CAF2618, desc: VISIBLE cutoff xid 3681024856 flags 0x01, blkref #0: rel 1663/16881/2840 fork vm blk 0 FPW, blkref #1: rel 1663/16881/2840 blk 54
pg_waldump: error: error in WAL record at 1201/1CD90E48: invalid record length at 1201/1CD91010: wanted 24, got 0
Also, the VM has crashed with OOM before, while runnning pg15, with no issue in
recovery. I haven't been able to track down the cause..
The VM is running: kernel-3.10.0-1160.66.1.el7.x86_64
pgsql is an ext4 FS (no tablespaces), which is a qemu block device
exposed like:
<driver name='qemu' type='raw' cache='none' io='native'/>
<target dev='vdg' bus='virtio'/>
It's nowhere near full:
/dev/vdc 96G 51G 46G 53% /var/lib/pgsql
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2022-09-01 02:48:40 | Re: Add the ability to limit the amount of memory that can be allocated to backends. |
Previous Message | Jeff Janes | 2022-09-01 02:33:08 | Re: num_sa_scans in genericcostestimate |