Re: Eliminate redundant tuple visibility check in vacuum

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Date: 2023-08-31 09:39:49
Message-ID: c9d30046-97b0-2db2-b45e-2aa82386c316@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Melanie,

On 8/31/23 02:59, Melanie Plageman wrote:
>> I created a large table and then updated a tuple on every page in the
>> relation and vacuumed it. I saw a consistent slight improvement in
>> vacuum execution time. I profiled a bit with perf stat as well. The
>> difference is relatively small for this kind of example, but I can
>> work on a more compelling, realistic example. I think eliminating the
>> retry loop is also useful, as I have heard that users have had to
>> cancel vacuums which were in this retry loop for too long.
> Just to provide a specific test case, if you create a small table like this
>
> create table foo (a int, b int, c int) with(autovacuum_enabled=false);
> insert into foo select i, i, i from generate_series(1, 10000000);
>
> And then vacuum it. I find that with my patch applied I see a
> consistent ~9% speedup (averaged across multiple runs).
>
> master: ~533ms
> patch: ~487ms
>
> And in the profile, with my patch applied, you notice less time spent
> in HeapTupleSatisfiesVacuumHorizon()
>
> master:
> 11.83% postgres postgres [.] heap_page_prune
> 11.59% postgres postgres [.] heap_prepare_freeze_tuple
> 8.77% postgres postgres [.] lazy_scan_prune
> 8.01% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
> 7.79% postgres postgres [.] heap_tuple_should_freeze
>
> patch:
> 13.41% postgres postgres [.] heap_prepare_freeze_tuple
> 9.88% postgres postgres [.] heap_page_prune
> 8.61% postgres postgres [.] lazy_scan_prune
> 7.00% postgres postgres [.] heap_tuple_should_freeze
> 6.43% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon

Thanks a lot for providing additional information and the test case.

I tried it on a release build and I also see a 10% speed-up. I reset the
visibility map between VACUUM runs, see:

CREATE EXTENSION pg_visibility; CREATE TABLE foo (a INT, b INT, c INT)
WITH(autovacuum_enabled=FALSE); INSERT INTO foo SELECT i, i, i from
generate_series(1, 10000000) i; VACUUM foo; SELECT
pg_truncate_visibility_map('foo'); VACUUM foo; SELECT
pg_truncate_visibility_map('foo'); VACUUM foo; ...

The first patch, which refactors the code so we can pass the result of
the visibility checks to the caller, looks good to me.

Regarding the 2nd patch (disclaimer: I'm not too familiar with that area
of the code): I don't completely understand why the retry loop is not
needed anymore and how you now detect/handle the possible race
condition? It can still happen that an aborting transaction changes the
state of a row after heap_page_prune() looked at that row. Would that
case now not be ignored?

--
David Geier
(ServiceNow)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-08-31 09:45:34 Re: Sync scan & regression tests
Previous Message Amit Kapila 2023-08-31 09:22:47 Re: persist logical slots to disk during shutdown checkpoint