Re: should vacuum's first heap pass be read-only?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-04-05 21:52:59
Message-ID: CA+TgmoY233jGJphik-hLb56JEDpW0Bks23zi8rq-jmAyiF-L3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2022 at 4:30 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Tue, Apr 5, 2022 at 1:10 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I had assumed that this would not be the case, because if the page is
> > being accessed by the workload, it can be pruned - and probably frozen
> > too, if we wanted to write code for that and spend the cycles on it -
> > and if it isn't, pruning and freezing probably aren't needed.
>
> [ a lot of things ]

I don't understand what any of this has to do with the point I was raising here.

> > > But, these same LP_DEAD-heavy tables *also* have a very decent
> > > chance of benefiting from a better index vacuuming strategy, something
> > > *also* enabled by the conveyor belt design. So overall, in either scenario,
> > > VACUUM concentrates on problems that are particular to a given table
> > > and workload, without being hindered by implementation-level
> > > restrictions.
> >
> > Well this is what I'm not sure about. We need to demonstrate that
> > there are at least some workloads where retiring the LP_DEAD line
> > pointers doesn't become the dominant concern.
>
> It will eventually become the dominant concern. But that could take a
> while, compared to the growth in indexes.
>
> An LP_DEAD line pointer stub in a heap page is 4 bytes. The smallest
> possible B-Tree index tuple is 20 bytes on mainstream platforms (16
> bytes + 4 byte line pointer). Granted deduplication makes this less
> true, but that's far from guaranteed to help. Also, many tables have
> way more than one index.
>
> Of course it isn't nearly as simple as comparing the bytes of bloat in
> each case. More generally, I don't claim that it's easy to
> characterize which factor is more important, even in the abstract,
> even under ideal conditions -- it's very hard. But I'm sure that there
> are routinely very large differences among indexes and the heap
> structure.

Yeah, I think we need to better understand how this works out.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gunnar "Nick" Bluth 2022-04-05 22:08:13 Re: [PATCH] pg_stat_toast
Previous Message Tom Lane 2022-04-05 21:50:02 Re: Granting SET and ALTER SYSTE privileges for GUCs