Re: Eliminate redundant tuple visibility check in vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Date: 2023-08-29 13:21:37
Message-ID: CAAKRu_b2DXxvGebDLcp70HYU3QqEnmZ0memSiA+-=2izeBrTYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,
Thanks for taking a look!

On Tue, Aug 29, 2023 at 5:07 AM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
>
> Hi Melanie,
>
> On 8/29/23 01:49, Melanie Plageman wrote:
> > While working on a set of patches to combine the freeze and visibility
> > map WAL records into the prune record, I wrote the attached patches
> > reusing the tuple visibility information collected in heap_page_prune()
> > back in lazy_scan_prune().
> >
> > heap_page_prune() collects the HTSV_Result for every tuple on a page
> > and saves it in an array used by heap_prune_chain(). If we make that
> > array available to lazy_scan_prune(), it can use it when collecting
> > stats for vacuum and determining whether or not to freeze tuples.
> > This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
> > the page.
> >
> > It also gets rid of the retry loop in lazy_scan_prune().
>
> How did you test this change?

I didn't add a new test, but you can confirm some existing test
coverage if you, for example, set every HTSV_Result in the array to
HEAPTUPLE_LIVE and run the regression tests. Tests relying on vacuum
removing the right tuples may fail. For example, select count(*) from
gin_test_tbl where i @> array[1, 999]; in src/test/regress/sql/gin.sql
fails for me since it sees a tuple it shouldn't.

> Could you measure any performance difference?
>
> If so could you provide your test case?

I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-08-29 13:27:23 Re: Debian 12 gcc warning
Previous Message torikoshia 2023-08-29 13:15:51 Re: pg_rewind WAL segments deletion pitfall