Re: New criteria for autovacuum

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New criteria for autovacuum
Date: 2025-04-04 19:27:45
Message-ID: 39b8c638-0762-4d43-ba9c-68accc902f9f@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 04/04/2025 7:10 pm, Melanie Plageman wrote:
> On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik<knizhnik(at)garret(dot)ru> wrote:
>> What is needed to reproduce the problem?
>> 1. Table with populated data
>> 2. Presence of transaction with assigned XID which prevents vacuum from
>> marking pages of this table as all visible
>> 3. Vacuum or autovacuum processed this table (to eliminate dead tuple
>> and reset number of inserted tuples since last vacuum).
>>
>> After this 3 steps autovacuum will never be called for this table (at
>> least until forced vacuum caused by wraparound).
>> And IOS will not be used or be very inefficient fot this table.
> ISTM, this is more an issue of ins_since_vacuum being reset to 0 in
> pstat_report_vacuum() even though those inserted tuples weren't
> necessarily frozen and their pages not set all-visible. I don't know
> exactly how we could modify that logic, but insert-triggered vacuums
> are meant to set pages all-visible and freeze tuples, and if they
> don't do that, it doesn't quite make sense to zero out the counter
> that could trigger another one.
>
> That being said, long-running transactions are a problem for
> autovacuum in general. Even if you track this stat you are proposing
> about heap fetches by index only scans, you won't know if the long
> running transaction is over and thus if it makes sense to try and
> trigger an autovacuum for that table again anyway.
>
> - Melanie

From logical point of view I agree with you: taken in account number of
inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to
`ins_all_visible_since_vacuum` and count only all-visible tuples. If
newly inserted tuple is not visible to all, then it should not be
accounted in statistic and trigger autovacuum. But I have completely no
idea of how to efficiently maintain such counter: we should keep track
of xids of all recently inserted tuples and on each transaction commit
determine which one of them become all-visible.

And your suggestion just to not reset `ins_since_vacuum` until all of
them becomes all-visible may be easier to implement, but under permanent
workload it can lead to situation when `ins_since_vacuum` is never reset
and at each vacuum iteration cause vacuuming of the table. Which may
cause significant degrade of performance. Even without long-living
transactions.

So I still think that maintaining count of heap visibility check is the
best alternative. It quite easy to implement, adds almost no overhead
and this information indicates efficiency of index-only scan. So it
seems to be useful even if not used by autovacuum.

Yes, long-living transactions and vacuum are "antagonists". If there is
long-living transaction, then forcing autovacuum because of number of
visibility checks criteria can also (as in case of not reseting
`ins_since_vacuum` counter) cause degrade of performance because of too
frequent and useless autovacuum runs for the table. But there is big
difference: using `checks_since_vacuum` criteria we trigger autovacuum
next time only when this counter exceeds threshold. Which should not
happen fast because this counter is reset after each vacuum. Unlike
`ins_since_vacuum` counter which you suggested not to reset until pages
are marked as all-visible by vacuum. In the last case autovacuum will be
invoked for the table each `autovacum_naptime` seconds.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Srirama Kucherlapati 2025-04-04 19:31:58 RE: AIX support
Previous Message Tomas Vondra 2025-04-04 19:25:57 Re: Draft for basic NUMA observability