Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-02-20 03:47:32
Message-ID: 20220220034732.cchlk25elmechd4k@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-19 19:31:21 -0800, Peter Geoghegan wrote:
> On Sat, Feb 19, 2022 at 6:16 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > > Given that heap_surgery's raison d'etre is correcting corruption etc, I think
> > > it makes sense for it to do as minimal work as possible. Iterating through a
> > > HOT chain would be a problem if you e.g. tried to repair a page with HOT
> > > corruption.
> >
> > I guess that's also true. There is at least a legitimate argument to
> > be made for not leaving behind any orphaned heap-only tuples. The
> > interface is a TID, and so the user may already believe that they're
> > killing the heap-only, not just the root item (since ctid suggests
> > that the TID of a heap-only tuple is the TID of the root item, which
> > is kind of misleading).
>
> Actually, I would say that heap_surgery's raison d'etre is making
> weird errors related to corruption of this or that TID go away, so
> that the user can cut their losses. That's how it's advertised.

I'm not that sure those are that different... Imagine some corruption leading
to two hot chains ending in the same tid, which our fancy new secure pruning
algorithm might detect.

Either way, I'm a bit surprised about the logic to not allow killing redirect
items? What if you have a redirect pointing to an unused item?

> Let's assume that we don't want to make VACUUM/pruning just treat
> orphaned heap-only tuples as DEAD, regardless of their true HTSV-wise
> status

I don't think that'd ever be a good idea. Those tuples are visible to a
seqscan after all.

> -- let's say that we want to err in the direction of doing
> nothing at all with the page. Now we have to have a weird error in
> VACUUM instead (not great, but better than just spinning between
> lazy_scan_prune and heap_prune_page).

Non DEAD orphaned versions shouldn't cause a problem in lazy_scan_prune(). The
problem here is a DEAD orphaned HOT tuples, and those we should be able to
delete with the new page pruning logic, right?

I think it might be worth getting rid of the need for the retry approach by
reusing the same HTSV status array between heap_prune_page and
lazy_scan_prune. Then the only legitimate reason for seeing a DEAD item in
lazy_scan_prune() would be some form of corruption. And it'd be a pretty
decent performance boost, HTSV ain't cheap.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-02-20 03:56:53 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message Peter Geoghegan 2022-02-20 03:40:20 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations