Re: New criteria for autovacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New criteria for autovacuum
Date: 2025-04-04 19:41:36
Message-ID: CAAKRu_bt1C1RcsNs1zGVtS9rcJ_yvp8WW8+0DEz0xuiFXQ0HmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 4, 2025 at 3:27 PM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>
> From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
> So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible.
>
> And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.

Right, you definitely can't just never reset it to 0. As it is
currently defined, the behavior is correct, that is how many tuples
were inserted since the last vacuum.

As for your proposed patch, I agree with Robert that triggering
vacuums using the stat you proposed just adds to the problem we know
we already have with wasted vacuuming work due to long-running
transactions.

A more targeted solution to your specific problem would be to update
the visibility map on access. Then, the first time you have to fetch
that heap page, you could mark it all-visible (assuming the long
running transaction has ended) and then the next index-only scan
wouldn't have to do the same heap fetch. It doesn't add any overhead
in the case that the long running transaction has not ended, unlike
trying to trigger another autovacuum.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-04-04 19:46:52 Re: Using read stream in autoprewarm
Previous Message Srirama Kucherlapati 2025-04-04 19:31:58 RE: AIX support