From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: BUG #17245: Index corruption involving deduplicated entries |
Date: | 2021-10-29 18:45:34 |
Message-ID: | CAH2-WzkTd6wHXhxn=hZT3NYwgoKNHaC5sDUwswtUo7Ay-VdR4w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Fri, Oct 29, 2021 at 11:36 AM Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp> wrote:
> The newly manifested issue is caught by pg_amcheck:
>
> btree index "azurlane_wiki.mediawiki.page_main_title":
> ERROR: item order invariant violated for index "page_main_title"
> DETAIL: Lower index tid=(17,157) (points to heap tid=(540,5))
> higher index tid=(17,158) (points to heap tid=(540,5)) page lsn=2/A019DD78.
Great!
I'm not surprised to see that it's the page table, once again. It's
not particularly big, right? Are there other tables that are much
larger?
> The weird part about this is that the WAL archive does not seem to
> contain any data for 157 and 158 above (in 1663/19243/274869 blk 17).
> The last two entries are
>
> rmgr: Btree len (rec/tot): 53/ 4885, tx: 2085600, lsn:
> 2/A0195AE0, prev 2/A01943F0, desc: INSERT_LEAF off 155, blkref #0: rel
> 1663/19243/274869 blk 17 FPW
> rmgr: Btree len (rec/tot): 72/ 72, tx: 2085602, lsn:
> 2/A019DD30, prev 2/A019DCF0, desc: INSERT_LEAF off 156, blkref #0: rel
> 1663/19243/274869 blk 17
>
> The WAL file in data14/pg_wal does not have anything related to 157 and
> 158 for this filenode/blk as well.
If this was a heap relation then that would be true, because the
offset number of a heap needs to be stable, at least within a "VACUUM
cycle" (otherwise indexes will point to the wrong things, which would
of course be wrong). However, this relation is a B-Tree index, where
TIDs/page offset numbers are not stable at all.
Almost all individual index tuple inserts onto a B-Tree page put the
new index tuple "between" existing index tuples. This will "shift"
whatever index tuples are to the right of the position of the new
tuple. For example, with "INSERT_LEAF off 156", the insert atomic
operation will shift any existing index tuple at page offset 156 go to
page offset 157, plus any index tuple that was at page offset 157 is
going to go to 158. And so on.
We don't physically shift the index tuples themselves, but we do shift
the item ID/line pointer array at the start of the page, so it's not
too expensive.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2021-10-29 18:47:57 | Re: BUG #17258: Unexpected results in CHAR(1) data type |
Previous Message | Kamigishi Rei | 2021-10-29 18:36:51 | Re: BUG #17245: Index corruption involving deduplicated entries |