Re: New criteria for autovacuum

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject: Re: New criteria for autovacuum
Date: 2025-04-03 15:29:49
Message-ID: CAJ7c6TPZf5EfLFohUyZXqmqFKOLhMnamZL+CFZuuuGSrcNm=Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Konstantin,

> But the problem can be quite easily reproduced. We can just populate table with some data with some other transaction with assigned XID active.
> Then explicitly vacuum this tables or wait until autovacuum does it.
> At this moment table has no more dead or inserted tuples so autovacuum will not be called for it. But heap pages of this table are still not marked as all-visible.
> And will never be marked as all-visible unless table is updated or is explicitly vacuumed.

I decided to experiment with the scenario you are describing. For
those who likes to have exact steps to reproduce the issue, as I do,
here they are:

Session 1:

```
CREATE TABLE humidity(
ts TIMESTAMP NOT NULL,
city TEXT NOT NULL,
humidity INT NOT NULL);

CREATE INDEX humidity_idx ON humidity (ts, city) INCLUDE (humidity);

INSERT INTO humidity (ts, city, humidity)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
FROM generate_series('2010-01-01' :: TIMESTAMP,
'2020-12-31', '1 hour') AS ts,
unnest(array['Moscow', 'Berlin']) AS city;
```

Session 2:

```
BEGIN;

INSERT INTO humidity (ts, city, humidity)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
FROM generate_series('2022-01-01' :: TIMESTAMP,
'2025-12-31', '1 hour') AS ts,
unnest(array['Moscow', 'Berlin']) AS city;

-- no COMMIT
```

Session 1:

```
VACUUM humidity;
```

Session 2:

```
COMMIT;
```

Session 1:

```
EXPLAIN (ANALYZE, BUFFERS ON) SELECT humidity FROM humidity WHERE ts
>= '2022-01-01' AND ts < '2022-05-02' AND city = 'Moscow';
```

The result is:

```
Index Only Scan using humidity_idx on humidity (cost=0.42..58.75
rows=67 width=4) (actual time=0.060..7.490 rows=2904.00 loops=1)
Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time
zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone)
AND (city = 'M
oscow'::text))
Heap Fetches: 2904
Index Searches: 1
Buffers: shared hit=123
Planning:
Buffers: shared hit=10
Planning Time: 0.840 ms
Execution Time: 7.964 ms
```

... and it is claimed that autovacuum will never be triggered in order
to set hint bits, assuming we never modify the table again.

I have mixed feelings about addressing this. Although this behavior is
somewhat counterintuitive, if the user has a read-only lookup table
he/she can always execute VACUUM manually. In order to relieve him of
this unbearable burden we are going to need to introduce some overhead
that will affect all the users (not to mention people maintaining the
code). This would be convenient for sure but I'm not entirely sure if
it's worth it.

Thoughts?

--
Best regards,
Aleksander Alekseev

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2025-04-03 15:49:52 Re: [PATCH] Add sortsupport for range types and btree_gist
Previous Message Alexander Korotkov 2025-04-03 15:26:48 Re: Replace IN VALUES with ANY in WHERE clauses during optimization