From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: Higher level questions around shared memory stats |
Date: | 2022-03-30 21:08:41 |
Message-ID: | 20220330210841.2dymkb2ydbdonp5f@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-03-30 14:42:23 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 5:01 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I think it's reasonably rare because in cases there'd be corruption, we'd
> > typically not even have written them out / throw them away explicitly - we
> > only read stats when starting without crash recovery.
> >
> > So the "expected" case of corruption afaicts solely is a OS crash just after
> > the shutdown checkpoint completed?
>
> Can we prevent that case from occurring, so that there are no expected cases?
We likely can, at least for the causes of corruption I know of. We already
write the statsfile into a temporary filename and then rename into place. I
think all we'd need to do is to use durable_rename() to make sure it's durable
once renamed into place.
It's really unrelated to the shared memory stats patch though, so I'd prefer
not to tie it to that.
> > I can think of these different times:
> >
> > - Last time stats were removed due to starting up in crash recovery
> > - Last time stats were created from scratch, because no stats data file was
> > present at startup
> > - Last time stats were thrown away due to corruption
> > - Last time a subset of stats were reset using one of the pg_reset* functions
> >
> > Makes sense?
>
> Yes. Possibly that last could be broken in to two: when all stats were
> last reset, when some stats were last reset.
Believe it or not, we don't currently have a function to reset all stats. We
should definitely add that though, because the invocation to reset all stats
gets more ridiculous^Wcomplicated with each release.
I think the minimal invocation currently is something like:
-- reset all stats shared between databases
SELECT pg_stat_reset_shared('archiver');
SELECT pg_stat_reset_shared('bgwriter');
SELECT pg_stat_reset_shared('wal');
SELECT pg_stat_reset_replication_slot(NULL);
SELECT pg_stat_reset_slru(NULL);
SELECT pg_stat_reset_subscription_stats(NULL);
-- connect to each database and reset the stats in that database
SELECT pg_stat_reset();
I've protested against replication slot, slru, subscription stats not being
resettable via pg_stat_reset_shared(), nobody else seemed to care.
> > > Does redo update the stats?
> >
> > With "update" do you mean generate new stats? In the shared memory stats patch
> > it triggers stats to be dropped, on HEAD it just resets all stats at startup.
> >
> > Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.
>
> Well, I guess what I'm trying to figure out is what happens if we run
> in recovery for a long time -- say, a year -- and then get promoted.
> Do we have reasons to expect that the stats will be accurate enough to
> use at that point, or will they be way off?
What do you mean with 'accurate enough'?
With or without shared memory stats pg_stat_all_tables.{n_mod_since_analyze,
n_ins_since_vacuum, n_live_tup, n_dead_tup ...} will be be zero. The replay
process doesn't update them.
In contrast to that, things like pg_stat_all_tables.{seq_scan, seq_tup_read,
idx_tup_fetch, ...} will be accurate, with one exception below.
pg_stat_bgwriter, pg_stat_wal, etc will always be accurate.
On HEAD, there may be a lot of dead stats for dropped databases / tables /
functions that have been dropped since the start of the cluster. They will
eventually get removed, once autovacuum starts running in the respective
database (i.e. pgstat_vacuum_stat() gets run).
The exception noted above is that because pg_stat_all_tables contents are
never removed during recovery, it becomes a lot more plausible for oid
conflicts to occur. So the stats for two different tables might get added up
accidentally - but that'll just affect the non-zero columns, of course.
With the shared memory stats patch, stats for dropped objects (i.e. databases,
tables, ... ) are removed shortly after they have been dropped, so that
conflict risk doesn't exist anymore.
So I don't think increasing inaccuracy is a reason to throw away stats on
replica startup. Particularly because we already don't throw them away when
promoting the replica, just when having started it last.
> I don't have a great understanding of how this all works, but if
> running recovery for a long time is going to lead to a situation where
> the stats progressively diverge from reality, then preserving them
> doesn't seem as valuable as if they're going to be more or less
> accurate.
Minus the oid wraparound risk on HEAD, the only way they increasingly diverge
is that the '0' in a bunch of pg_stat_all_tables columns might get less and
less accurate. But that's not the type of divergence you're talking about, I
think.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2022-03-30 21:22:58 | Re: [Proposal] vacuumdb --schema only |
Previous Message | David G. Johnston | 2022-03-30 21:05:01 | Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset |