Re: New criteria for autovacuum

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New criteria for autovacuum
Date: 2025-04-04 20:03:56
Message-ID: CAA5RZ0sPur3KUt9JXW9J88Lcjosdz73m91SdJnXrFgat7Lqf1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
> So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples.
> If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum.

cumulative stats are not updated for an in-flight transaction, so
effectively, these rows
are not accounted for by vacuum.

I did just realize however that it is strange that a rolledback
session will accumulate n_tup_ins and n_ins_since_vacuum

# session 1
test=# create table t (id int);
CREATE TABLE
test=# begin;
BEGIN
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1

# session 2

test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
0 | 0 | 0 | 0 | 0 |
0
(1 row)

# session 1
ROLLBACK

# session 2
test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
3 | 0 | 0 | 0 | 3 |
3
(1 row)

ROLLBACKs should not be the norm, but this looks like a bug to me
as it may trigger vacuum based on insert threshold more often.
Thoughts?

> So I still think that maintaining count of heap visibility check is the best alternative.
> It quite easy to implement, adds almost no overhead and this information indicates efficiency
> of index-only scan. So it seems to be useful even if not used by autovacuum.

I agree. It could aid in a user that wants to schedule some manual
vacuum for tables
that need very optimal index-only scans, or to per-table tune autovac
for those types of
tables. I actually think we should add a few columns as mentioned here [0]

>> That being said, long-running transactions are a problem for
>> autovacuum in general. Even if you track this stat you are proposing
>> about heap fetches by index only scans, you won't know if the long
>> running transaction is over and thus if it makes sense to try and
>> trigger an autovacuum for that table again anyway.

> This. It would be really useful to have some kind of a system for
> figuring out when -- in terms of XIDs -- we ought to vacuum which
> table. I think that's a hard problem, but it would help a lot of
> people.

hmm, isn't that what anti-wraparound autovac does? or, I may
have missed the point here completely

[0] https://www.postgresql.org/message-id/CAA5RZ0t1U38qtVAmg3epjh5RBbpT4VRB_Myfp0oGm_73w-UNRA%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-04-04 20:06:45 Re: Statistics Import and Export
Previous Message Nathan Bossart 2025-04-04 19:56:54 Re: Statistics Import and Export