Re: Eliminate redundant tuple visibility check in vacuum

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Date: 2023-08-31 18:03:10
Message-ID: CA+Tgmobh6pP=sUrgPMdV6Gu5CDBtv6DAgGo-dyj-ZaA_O-XcJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 28, 2023 at 7:49 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> While working on a set of patches to combine the freeze and visibility
> map WAL records into the prune record, I wrote the attached patches
> reusing the tuple visibility information collected in heap_page_prune()
> back in lazy_scan_prune().
>
> heap_page_prune() collects the HTSV_Result for every tuple on a page
> and saves it in an array used by heap_prune_chain(). If we make that
> array available to lazy_scan_prune(), it can use it when collecting
> stats for vacuum and determining whether or not to freeze tuples.
> This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
> the page.
>
> It also gets rid of the retry loop in lazy_scan_prune().

In general, I like these patches. I think it's a clever approach, and
I don't really see any down side. It should just be straight-up better
than what we have now, and if it's not better, it still shouldn't be
any worse.

I have a few suggestions:

- Rather than removing the rather large comment block at the top of
lazy_scan_prune() I'd like to see it rewritten to explain how we now
deal with the problem. I'd suggest leaving the first paragraph ("Prior
to...") just as it is and replace all the words following "The
approach we take now is" with a description of the approach that this
patch takes to the problem.

- I'm not a huge fan of the caller of heap_page_prune() having to know
how to initialize the PruneResult. Can heap_page_prune() itself do
that work, so that presult is an out parameter rather than an in-out
parameter? Or, if not, can it be moved to a new helper function, like
heap_page_prune_init(), rather than having that logic in 2+ places?

- int ndeleted,
- nnewlpdead;
-
- ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
- limited_ts, &nnewlpdead, NULL);
+ int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
+ limited_ts, &presult, NULL);

- I don't particularly like merging the declaration with the
assignment unless the call is narrow enough that we don't need a line
break in there, which is not the case here.

- I haven't thoroughly investigated yet, but I wonder if there are any
other places where comments need updating. As a general note, I find
it desirable for a function's header comment to mention whether any
pointer parameters are in parameters, out parameters, or in-out
parameters, and what the contract between caller and callee actually
is.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Muhammad Malik 2023-08-31 18:12:34 Re: Improve heapgetpage() performance, overhead from serializable
Previous Message Денис Смирнов 2023-08-31 17:44:41 Re: Use virtual tuple slot for Unique node