Re: Flush pgstats file during checkpoints

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Flush pgstats file during checkpoints
Date: 2024-09-03 01:08:46
Message-ID: ZtZhnuWotWSI_Xbt@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 02, 2024 at 05:08:03PM +0300, Heikki Linnakangas wrote:
> Hmm, I'm a bit disappointed this doesn't address replication. It makes sense
> that scans are counted separately on a standby, but it would be nice if
> stats like last_vacuum were propagated from primary to standbys. I guess
> that can be handled separately later.

Yes, it's not something that I'm planning to tackle for this thread.
Speaking of which, the design that I got in mind for this area was not
"that" complicated:
- Add a new RMGR for all the stats.
- Add a first callback for stats kinds for WAL inserts, giving to each
stats the possibility to pass down data inserted to the record, as we
want to replicate a portion of the data depending on the kind dealt
with.
- Add a second callback for recovery, called depending on the kind ID.

I have not looked into the details yet, but stats to replicate should
be grouped in a single record on transaction commit or depending on
the flush timing for fixed-numbered stats. Or we should just add them
in commit records?

> Reviewing v7-0001-Flush-pgstats-file-during-checkpoints.patch:
>
> There are various race conditions where a stats entry can be leaked in the
> pgstats file. I.e. relation is dropped, but its stats entry is retained in
> the stats file after crash. In the worst case, suck leaked entries can
> accumulate until the stats file is manually removed, which resets all stats
> again. Perhaps that's acceptable - it's still possible leak the actual
> relation file for a new relation on crash, after all, which is much worse
> (I'm glad Horiguchi-san is working on that [1]).

Yeah, that's not an easy issue. We don't really have a protection
regarding that as well now. Backends can also refer to stats entries
in their shutdown callback that have been dropped concurrently. See
some details about that at https://commitfest.postgresql.org/49/5045/.

> Until 5891c7a8ed, there was a mechanism to garbage collect such orphaned
> entries (pgstat_vacuum()). How bad would it be to re-introduce that? Or can
> we make it more watertight so that there are no leaks?

Not sure about this part, TBH. Doing that again in autovacuum does
not excite me much as it has a cost.

> I think it's failing to flush the stats file at the end of recovery
> checkpoint.

Missed that, oops. I'll double-check this area.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-09-03 01:35:15 Re: Partitioned tables and [un]loggedness
Previous Message Michael Paquier 2024-09-03 00:22:58 Re: Partitioned tables and [un]loggedness