From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: xid wraparound danger due to INDEX_CLEANUP false |
Date: | 2020-04-29 18:28:00 |
Message-ID: | CAH2-WzmjqPZ-R0gyzLxLw1R5p8dWsfc+xTcTU8b2Vzy=suNhow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 16, 2020 at 11:27 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Sure, there is some pre-existing wraparound danger for individual
> pages. But it's a pretty narrow corner case before INDEX_CLEANUP
> off.
>
> That comment says something about "shared-memory free space map", making
> it sound like any crash would loose the page. But it's a normal FSM
> these days. Vacuum will insert the deleted page into the free space
> map. So either the FSM would need to be corrupted to not find the
> inserted page anymore, or the index would need to grow slow enough to
> not use a page before the wraparound. And then such wrapped around xids
> would exist on individual pages. Not on all deleted pages, like with
> INDEX_CLEANUP false.
Is that really that narrow, even without "INDEX_CLEANUP false"? It's
not as if the index needs to grow very slowly to have only very few
page splits hour to hour (it depends on whether the inserts are random
or not, and so on). Especially if you had a bulk DELETE affecting many
rows, which is hardly that uncommon.
Fundamentally, btvacuumpage() doesn't freeze 32-bit XIDs (from
bpto.xact) when it recycles deleted pages. It simply puts them in the
FSM without changing anything about the page itself. This means
surprisingly little in the context of nbtree: the
_bt_page_recyclable() XID check that takes place in btvacuumpage()
also takes place in _bt_getbuf(), at the point where the page actually
gets recycled by the client. That's not great.
It wouldn't be so unreasonable if btvacuumpage() actually did freeze
the bpto.xact value at the point where it puts the page in the FSM. It
doesn't need to be crash safe; it can work as a hint. Maybe "freezing"
is the wrong word (too much baggage). More like we'd have VACUUM
represent that "this deleted B-Tree page is definitely not considered
to still be a part of the tree by any possible other backend" using a
page flag hint -- btvacuumpage() would "mark the deleted page as
recyclable" explicitly. Note that we still need to keep the original
bpto.xact XID around for _bt_log_reuse_page() (also, do we need to
worry _bt_log_reuse_page() with a wrapped-around XID?).
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-04-29 18:41:04 | Re: new heapcheck contrib module |
Previous Message | Tom Lane | 2020-04-29 18:02:34 | Re: SEQUENCE values (duplicated) in some corner cases when crash happens |