From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Петър Славов <pet(dot)slavov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY |
Date: | 2022-05-24 18:15:42 |
Message-ID: | 20220524181542.cmec7rcv3vvvtze4@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2022-05-24 09:37:02 -0700, Andres Freund wrote:
> On 2022-05-24 11:02:12 +0200, Alvaro Herrera wrote:
> > I approached it yesterday by running the test case with each
> > set_indexsafe_procflags() callsite commented out, see which one breaks
> > things. Didn't reach any conclusion because I ran into thermal problems
> > with my laptop, which got me angry and couldn't make any further
> > progress.
>
> Do we have any idea what really causes the corruption?
Reproed the problem with the pgbench script, against an existing cluster. One
note: With fsync=on, it's much harder to reproduce.
One instance:
pgbench: error: client 0 script 2 aborted in command 3 query 0: ERROR: heap tuple (37,56) from table "tbl" lacks matching index tuple within index "idx"
HINT: Retrying verification using the function bt_index_parent_check() might provide a more specific error.
the last record for that offset is (with waldump enhanced to print offsets):
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 0, lsn: 0/8E56D218, prev 0/8E56D1F0, desc: PRUNE latestRemovedXid 1878329 nredirected 1 ndead 0 nunused 1, unused: [178], redirected: [56->53], blkref #0: rel 1663/5/27905 blk 37
the tuple originally was inserted:
rmgr: Heap len (rec/tot): 95/ 95, tx: 1802927, lsn: 0/89776380, prev 0/89776358, desc: INSERT off 56 flags 0x04, blkref #0: rel 1663/5/27905 blk 37
rmgr: Heap len (rec/tot): 48/ 48, tx: 1802927, lsn: 0/89776460, prev 0/89776420, desc: HEAP_CONFIRM off 56, blkref #0: rel 1663/5/27905 blk 37
and updated a "while" back:
rmgr: Heap len (rec/tot): 54/ 54, tx: 1804510, lsn: 0/898B5E68, prev 0/898B5E40, desc: LOCK off 56: xid 1804510: flags 0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/5/27905 blk 37
rmgr: Heap len (rec/tot): 73/ 73, tx: 1804510, lsn: 0/898B5EA0, prev 0/898B5E68, desc: HOT_UPDATE off 56 xmax 1804510 flags 0x60 ; new off 80 xmax 1804510, blkref #0: rel 1663/5/27905 blk 37
since which there have been numerous redirections to other tuples:
rmgr: Heap2 len (rec/tot): 212/ 212, tx: 0, lsn: 0/89AC6B40, prev 0/89AC6B00, desc: PRUNE latestRemovedXid 1806705 nredirected 16 ndead 0 nunused 47, unused: [47, 61, 12, 14, 38, 52, 53, 60, 75, 16, 17, 31, 50, 33, 59, 76, 49, 51, 54, 63, 89, 95, 18, 19, 29, 32, 48, 62, 81, 85, 88, 106, 28, 39, 40, 44, 26, 27, 45, 58, 73, 91, 34, 41, 80, 86, 96], redirected: [1->74, 3->93, 4->37, 7->94, 8->72, 9->87, 10->105, 15->108, 21->71, 23->98, 30->79, 35->102, 55->92, 56->107, 67->103, 84->104], blkref #0: rel 1663/5/27905 blk 37
rmgr: Heap2 len (rec/tot): 220/ 220, tx: 0, lsn: 0/89DC6240, prev 0/89DC6218, desc: PRUNE latestRemovedXid 1810120 nredirected 27 ndead 0 nunused 29, unused: [74, 88, 93, 19, 94, 53, 59, 105, 16, 29, 118, 108, 71, 26, 38, 98, 79, 52, 102, 31, 54, 92, 107, 50, 113, 103, 104, 60, 85], redirected: [1->122, 3->62, 5->49, 7->91, 10->121, 15->28, 21->51, 23->119, 24->58, 30->73, 33->81, 34->86, 35->89, 36->14, 42->76, 45->112, 48->75, 55->80, 56->116, 67->17, 68->115, 70->61, 78->117, 83->27, 84->18, 97->120, 99->114], blkref #0: rel 1663/5/27905 blk 37
....
rmgr: Heap2 len (rec/tot): 60/ 60, tx: 0, lsn: 0/8E56D218, prev 0/8E56D1F0, desc: PRUNE latestRemovedXid 1878329 nredirected 1 ndead 0 nunused 1, unused: [178], redirected: [56->53], blkref #0: rel 1663/5/27905 blk 37
the target of the redirection that's pruned away in 0/8E56D218 is more recent:
rmgr: Heap len (rec/tot): 54/ 54, tx: 1878329, lsn: 0/8E556070, prev 0/8E556020, desc: LOCK off 178: xid 1878329: flags 0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/5/27905 blk 37
rmgr: Heap len (rec/tot): 74/ 74, tx: 1878329, lsn: 0/8E5560A8, prev 0/8E556070, desc: HOT_UPDATE off 178 xmax 1878329 flags 0x60 ; new off 53 xmax 1878329, blkref #0: rel 1663/5/27905 blk 37
I suspect the problem might be related to pruning done during the validation
scan. Once PROC_IN_SAFE_IC is set, the backend itself will not preserve tids
its own snapshot might need. Which will wreak havoc during the validation
scan.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-05-24 18:37:05 | Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY |
Previous Message | Peter Geoghegan | 2022-05-24 17:38:14 | Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY |