Re: Flush pgstats file during checkpoints

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Flush pgstats file during checkpoints
Date: 2024-09-02 14:08:03
Message-ID: 54fdfc3e-74f6-4ea4-b844-05aa66b39490@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> It'd not be such an issue if we updated stats during recovery, but I
>> think think we're doing that. Perhaps we should, which might also help
>> on replicas - no idea if it's feasible, though.
>
> Stats on replicas are considered an independent thing AFAIU (scans are
> counted for example in read-only queries). If we were to do that we
> may want to split stats handling between nodes in standby state and
> crash recovery. Not sure if that's worth the complication. First,
> the stats exist at node-level.

Hmm, I'm a bit disappointed this doesn't address replication. It makes
sense that scans are counted separately on a standby, but it would be
nice if stats like last_vacuum were propagated from primary to standbys.
I guess that can be handled separately later.

Reviewing v7-0001-Flush-pgstats-file-during-checkpoints.patch:

There are various race conditions where a stats entry can be leaked in
the pgstats file. I.e. relation is dropped, but its stats entry is
retained in the stats file after crash. In the worst case, suck leaked
entries can accumulate until the stats file is manually removed, which
resets all stats again. Perhaps that's acceptable - it's still possible
leak the actual relation file for a new relation on crash, after all,
which is much worse (I'm glad Horiguchi-san is working on that [1]).

For example:
1. BEGIN; CREATE TABLE foo (); ANALYZE foo;
2. CHECKPOINT;
3. pg_ctl restart -m immediate

This is the same scenario where we leak the relfile, but now you can
have it with e.g. function statistics too.

Until 5891c7a8ed, there was a mechanism to garbage collect such orphaned
entries (pgstat_vacuum()). How bad would it be to re-introduce that? Or
can we make it more watertight so that there are no leaks?

If you do this:

pg_ctl start -D data
pg_ctl stop -D data -m immediate
pg_ctl start -D data
pg_ctl stop -D data -m immediate

You get this in the log:

2024-09-02 16:28:37.874 EEST [1397281] WARNING: found incorrect redo
LSN 0/160A3C8 (expected 0/160A440)

I think it's failing to flush the stats file at the end of recovery
checkpoint.

[1]
https://www.postgresql.org/message-id/20240901.010925.656452225144636594.horikyota.ntt%40gmail.com

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-09-02 14:55:52 per backend I/O statistics
Previous Message Laurenz Albe 2024-09-02 14:00:51 Re: proposal: schema variables