Re: Autovacuum giving up on tables after crash because of lack of stats

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Autovacuum giving up on tables after crash because of lack of stats
Date: 2025-01-03 14:46:05
Message-ID: f3nda7f65zki2nliwasqyk3wul423c2anubqmnabvqno3tzw5l@p5gplm4nlsa4
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-12-25 10:10:44 +0900, Michael Paquier wrote:
> While digging again into the support for pgstats flushes across
> checkpoints, mainly to avoid autovacuum giving up on tables after a
> crash if stats entries cannot be found, I've been reminded about this
> point raised by Heikki:
> https://www.postgresql.org/message-id/54fdfc3e-74f6-4ea4-b844-05aa66b39490@iki.fi
>
> The point is quite simple: depending on the timing of the stats file
> flush we may finish with orphaned entries because some objects may be
> dropped after a checkpoint, leaving stats entries when these should be
> removed. This is an issue dependent on the pgstats file flush at
> checkpoint time, and we cannot by design reach this state currently as
> stats are flushed only at shutdown, once all the other backends have
> exited after the shutdown checkpoint is finished. There is a sanity
> check for that in pgstat_write_statsfile():
> Assert(!ps->dropped);

I don't think this has to be an issue. The right thing imo would be to make a
durable copy of the stats at the time of the *redo* pointer
determination. That way the WAL records for dropped stats objects are going to
be replayed, leaving us with no orphaned entries by the time the checkpoint
has been replayed.

Doesn't this solve the issue?

> Up to v14 and 5891c7a8ed, there was a mechanism in place able to do
> garbage collection of stats entries when vacuuming a database, where
> OIDs of various object types are collected and messages were sent to
> the stats collector to clean up things.
>
> With pgstats now in shared memory, we could do something else if we
> want to have some stats for autovacuum post-crash to take some
> decisions rather than giving up. Some ideas I can think of, some of
> them linked to the timing of the pgstats file flush, but not all:
>
> 1) Reimplement the same thing as ~14 in autovacuum workers with a
> flush of the pgstats file at each checkpoint, with a loop in all the
> stats kinds that would trigger a periodic cleanup of potential
> garbage entries, based on a new callback.

I am *vehemently* opposed to this approach.

> 2) The main issue I am trying to tackle is autovacuum giving up on
> tables if there are no stats entries, so we could add *some* WAL
> logging of the relation stats that are relevant for autovacuum, then
> replay them. I think that the correct approach here is to introduce
> one new RMGR for pgstats, giving to each stats kind the possibility
> to call a routine able to do WAL logging of *some* of its data (custom
> structure size, custom data size), and insert records associated to
> their stats kind.

As mentioned in the other thread, I don't think we need this - nearly all the
information is already and in the WAL for the actual action that the stats
track. We should re-emit the stats changes during recovery.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2025-01-03 14:47:09 Re: Fwd: Re: A new look at old NFS readdir() problems?
Previous Message Euler Taveira 2025-01-03 14:31:03 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart