| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: xid wraparound danger due to INDEX_CLEANUP false |
| Date: | 2020-04-29 20:40:55 |
| Message-ID: | CAH2-WzmAq5hWPEHb4s5cO+c+vYNpn-Ez3ZExUPV2FCD7TuorCA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 29, 2020 at 12:54 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Fundamentally, btvacuumpage() doesn't freeze 32-bit XIDs (from
> > bpto.xact) when it recycles deleted pages. It simply puts them in the
> > FSM without changing anything about the page itself. This means
> > surprisingly little in the context of nbtree: the
> > _bt_page_recyclable() XID check that takes place in btvacuumpage()
> > also takes place in _bt_getbuf(), at the point where the page actually
> > gets recycled by the client. That's not great.
>
> I think it's quite foolish for btvacuumpage() to not freeze xids. If we
> only do so when necessary (i.e. older than a potential new relfrozenxid,
> and only when the vacuum didn't yet skip pages), the costs are pretty
> miniscule.
I wonder if we should just bite the bullet and mark pages that are
recycled by VACUUM as explicitly recycled, with WAL-logging and
everything (this is like freezing, but stronger). That way, the
_bt_page_recyclable() call within _bt_getbuf() would only be required
to check that state (while btvacuumpage() would use something like a
_bt_page_eligible_for_recycling(), which would do the same thing as
the current _bt_page_recyclable()).
I'm not sure how low the costs would be, but at least we'd only have
to do it once per already-deleted page (i.e. only the first time a
VACUUM operation found _bt_page_eligible_for_recycling() returned true
for the page and marked it recycled in a crash safe manner). That
design would be quite a lot simpler, because it expresses the problem
in terms that make sense to the nbtree code. _bt_getbuf() should not
have to make a distinction between "possibly recycled" versus
"definitely recycled".
It makes sense that the FSM is not crash safe, I suppose, but we're
talking about relatively large amounts of free space here. Can't we
just do it properly/reliably?
--
Peter Geoghegan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Corey Huinker | 2020-04-29 20:50:35 | Re: Proposing WITH ITERATIVE |
| Previous Message | Peter Eisentraut | 2020-04-29 20:33:26 | Re: Setting min/max TLS protocol in clientside libpq |