Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-18 15:42:53
Message-ID: CA+TgmoZSST7Pf=o_Y6o-7WdUBQdZLj4YbaQu9qs5j_AgLqmRaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 18, 2024 at 10:09 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The problem with your justification for moving things in that
> direction (if any) is that it is occasionally not quite true: there
> are at least some cases where line pointer truncation after making a
> page's LP_DEAD items -> LP_UNUSED will actually matter. Plus
> PageGetHeapFreeSpace() will return 0 if and when
> "PageGetMaxOffsetNumber(page) > MaxHeapTuplesPerPage &&
> !PageHasFreeLinePointers(page)". Of course, nothing stops you from
> compensating for this by anticipating what will happen later on, and
> assuming that the page already has that much free space.

I think we're agreeing but I want to be sure. If we only set LP_DEAD
items to LP_UNUSED, that frees no space. But if doing so allows us to
truncate the line pointer array, that that frees a little bit of
space. Right?

One problem with using this as a justification for the status quo is
that truncating the line pointer array is a relatively recent
behavior. It's certainly much newer than the choice to have VACUUM
touch the FSM in the second page than the first page.

Another problem is that the amount of space that we're freeing up in
the second pass is really quite minimal even when it's >0. Any tuple
that actually contains any data at all is at least 32 bytes, and most
of them are quite a bit larger. Item pointers are 2 bytes. To save
enough space to fit even one additional tuple, we'd have to free *at
least* 16 line pointers. That's going to be really rare.

And even if it happens, is it even useful to advertise that free
space? Do we want to cram one more tuple into a page that has a
history of extremely heavy updates? Could it be that it's smarter to
just forget about that free space? You've written before about the
stupidity of cramming tuples of different generations into the same
page, and that concept seems to apply here. When we heap_page_prune(),
we don't know how much time has elapsed since the page was last
modified - but if we're lucky, it might not be very much. Updating the
FSM at that time gives us some shot of filling up the page with data
created around the same time as the existing page contents. By the
time we vacuum the indexes and come back, that temporal locality is
definitely lost.

> You'd likely prefer a simpler argument for doing this -- an argument
> that doesn't require abandoning/discrediting the idea that a high
> degree of FSM_CATEGORIES-wise precision is a valuable thing. Not sure
> that that's possible -- the current design is at least correct on its
> own terms. And what you propose to do will probably be less correct on
> those same terms, silly though they are.

I've never really understood why you think that the number of
FSM_CATEGORIES is the problem. I believe I recall you endorsing a
system where pages are open or closed, to try to achieve temporal
locality of data. I agree that such a system could work better than
what we have now. I think there's a risk that such a system could
create pathological cases where the behavior is much worse than what
we have today, and I think we'd need to consider carefully what such
cases might exist and what mitigation strategies might make sense.
However, I don't see a reason why such a system should intrinsically
want to reduce FSM_CATEGORIES. If we have two open pages and one of
them has enough space for the tuple we're now trying to insert and the
other doesn't, we'd still like to avoid having the FSM hand us the one
that doesn't.

Now, that said, I suspect that we actually could reduce FSM_CATEGORIES
somewhat without causing any real problems, because many tables are
going to have tuples that are all about the same size, and even in a
table where the sizes vary more than is typical, a single tuple can't
consume more than a quarter of the page, so granularity above that
point seems completely useless. So if we needed some bitspace to track
the open/closed status of pages or similar, I suspect we could find
that in the existing FSM byte per page without losing anything. But
all of that is just an argument that reducing the number of
FSM_CATEGORIES is *acceptable*; it doesn't amount to an argument that
it's better. My current belief is that it isn't better, just a vehicle
to do something else that maybe is better, like squeezing open/closed
tracking or similar into the existing bit space. My understanding is
that you think it would be better on its own terms, but I have not yet
been able to grasp why that would be so.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-18 15:46:46 Re: Increasing IndexTupleData.t_info from uint16 to uint32
Previous Message Andrey Borodin 2024-01-18 15:39:43 Re: UUID v7