Re: should vacuum's first heap pass be read-only?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-04-01 04:08:00
Message-ID: CAFiTN-ugw=K=kgxRV0ZYbuXi4ysJt47rA3hWYGN4f4dzSf20YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 1, 2022 at 1:55 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>

> But having said that, coming back to this with fresh eyes, I think
> Dilip has a really good point here. If the first thing we do at the
> start of every VACUUM is scan the heap in a way that is guaranteed to
> rediscover all of the dead TIDs that we've previously added to the
> conveyor belt plus maybe also new ones, we may as well just forget the
> whole idea of having a conveyor belt at all. At that point we're just
> talking about a system for deciding when to skip index vacuuming, and
> the conveyor belt is a big complicated piece of machinery that stores
> data we don't really need for anything because if we threw it out the
> next vacuum would reconstruct it anyhow and we'd get exactly the same
> results with less code.

After thinking more about this I see there is some value of
remembering the dead tids in the conveyor belt. Basically, the point
is if there are multiple indexes and we do the index vacuum for some
of the indexes and skip for others. And now when we again do the
complete vacuum cycle that time we will again get all the old dead
tids + the new dead tids but without conveyor belt we might need to
perform the multiple cycle of the index vacuum even for the indexes
for which we had done the vacuum in previous vacuum cycle (if all tids
are not fitting in maintenance work mem). But with the conveyor belt
we remember the conveyor belt pageno upto which we have done the index
vacuum and then we only need to do vacuum for the remaining tids which
will definitely reduce the index vacuuming passes, right?

So my stand is, a) for the global indexes we must need the conveyor
belt for remembering the partition wise dead tids (because after
vacuuming certain partitions when we go for global index vacuuming we
don't want to rescan all the partitions to get the same dead items) b)
and even without global indexes there are advantages of storing dead
items in the conveyor belt as explained in my previous paragraph. So
I think it is worth adding the conveyor belt infrastructure.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-04-01 04:08:27 Re: generic plans and "initial" pruning
Previous Message Michael Paquier 2022-04-01 04:00:20 Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set