Re: shared-memory based stats collector

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: andres(at)anarazel(dot)de
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, gkokolatos(at)protonmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: shared-memory based stats collector
Date: 2021-03-25 05:02:53
Message-ID: 20210325.140253.1626149859180740869.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the the lot of help!

At Mon, 22 Mar 2021 16:17:37 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in
> Thanks! That change shouldn't be necessary on my branch - I did
> something to fix this kind of problem too. I decided that there's no
> point in doing hash table lookups for the database: It's not going to
> change in the life of a backend. So there's now two static "pending"

Right.

> entries: One for the current DB, one for the shared DB. There's only
> one place that needed to change,
> pgstat_report_checksum_failures_in_db(), which now reports the changes
> directly instead of going via pending.
> I suspect we should actually do that with a number of other DB specific
> functions. Things like recovery conflicts, deadlocks, checksum failures
> imo should really not be delayed till later. And you should never have
> enough of them to make contention a concern.

Sounds readonable.

> You can see a somewhat sensible list of changes from your v52 at
> https://github.com/anarazel/postgres/compare/master...shmstat-before-split-2021-03-22
> (I did fix some of damage from rebase in a non-incremental way, of course)
>
> My branch: https://github.com/anarazel/postgres/tree/shmstat
>
> It would be cool if you could check if there any relevant things between
> v52 and v56 that I should include.
>
>
> I think a lot of the concerns I had with the patch are addressed at the
> end of my series of changes. Please let me know what you think.

I like the name "stats subsystem".

https://github.com/anarazel/postgres/commit/f28463601e93c68f4dd50fe930d29a54509cffc7

I'm impressed that the way you resolved "who should load stats". Using
static shared memory area to hold the point to existing DSA memory
resolves the "first attacher problem". However somewhat doubtful
about the "who should write the stats file", I think it is reasonable
in general.

But the current place of calling pgstat_write_stats() is a bit to
early. Checkpointer reports some stats *after* calling
ShutdownXLOG(). Perhaps we need to move it after pg_stat_report_*()
calls in HandleCheckpointerInterrupts().

Separating pgbestat_backend_initialize() from pgstat_initialize()
allows us to initialize stats subsystem earlier in autovacuum workers,
which looks nice.

https://github.com/anarazel/postgres/commit/3304ee1344f348e079b5eb208d76a2f1553e721c

> * Whenever the for a dropped stats entry could not be freed (because
> * backends still have references), this is incremented, causing backends
> * to run pgstat_lookup_cache_gc(), allowing that memory to be reclaimed.

"Whenever the <what?> for a "

gc_count is incremented whenever *some stats hash entries are
removed*. Some of the delinked shared stats area might not be freed
due to references.

If each backend finds that gc_count is incremented, it removes
corresponding local hash entries to the delinked shared entries. If
the backend was the last referer, it frees the shared area.

https://github.com/anarazel/postgres/commit/88ffb289860c7011e729cd0a1a01cda1899e6209

Ah, it sounds nice that refcount == 1 means it is to be dropped and no
one is referring to it. Thanks!

https://github.com/anarazel/postgres/commit/03824a236597c87c99d07aa14b9af9d6fe04dd37

+ * XXX: Why is this a good place to do this?

Agreed. We don't need to so haste to clean up stats entries. We could
run that in pgstat_reporT_stat()?

flush_walstat()

I found a mistake of an existing comment.

- * If nowait is true, this function returns false on lock failure. Otherwise
- * this function always returns true.
+ * If nowait is true, this function returns true on lock failure. Otherwise
+ * this function always returns false.

https://github.com/anarazel/postgres/commit/7bde068d8a512d918f76cfc88c1c10f1db8fe553
(pgstat_reset_replslot_counter())
+ * AFIXME: pgstats has business no looking into slot.c structures at
+ * this level of detail.

Does just moving name resolution part to pgstatfuncs.c resolve it?
pgstat_report_replslot_drop() have gotten fixed a similar way.

https://github.com/anarazel/postgres/commit/ded2198d93ce5944fc9d68031d86dd84944053f8

Yeah, I forcefully consolidated replslot stats are into stats-hash but
I agree that it would be more natural that replslot stats are
fixed-sized stats.

https://github.com/anarazel/postgres/commit/e2ef1931fb51da56a6ba483c960e034e52f90430

Agreed that it's better to move database stat entries to fixed pointers.

> My next step is going to be to squash all my changes into the base
> patch, and try to extract all the things that I think can be
> independently committed, and to reduce unnecessary diff noise. Once
> that's done I plan to post that series to the list.
>
>
> TODO:
>
> - explain the design at the top of pgstat.c
>
> - figure out a way to deal with the different demands on stats
> consistency / efficiency
>
> - see how hard it'd be to not need collect_oids()
>
> - split pgstat.c
>
> - consider removing PgStatTypes and replacing it with the oid of the
> table the type of stats reside in. So PGSTAT_TYPE_DB would be
> DatabaseRelationId, PGSTAT_TYPE_TABLE would be RelationRelationId, ...
>
> I think that'd make the system more cleanly extensible going forward?

I'm not sure that works as expected. We already separated repliation
stats from the unified stats hash and pgstat_read/write_statsfile()
needs have the corresponding specific code path.

> - I'm not yet happy with the naming schemes in use in pgstat.c. I feel
> like I improved it a bunch, but it's not yet there.

I feel the same about the namings.

> - the replication slot stuff isn't quite right in my branch

Ah, yeah. As I mentioned above I think it should be in the unified
stats and should have a special means of shotcut. And the global
stats also should be the same.

> - I still don't quite like the reset_offset stuff - I wonder if we can
> find something better there. And if not, whether we can deduplicate
> the code between functions like pgstat_fetch_stat_checkpointer() and
> pgstat_report_checkpointer().

Yeah, I find it annoying. If we had reset-offset as negatives (or 2's
complements) the two arithmetic are in the same shape. It might be
somewhat tricky but we can deduplicate the code. (In exchange, we
would have additional code to convert the reset offset.)

> At the very least it'll need a lot better comments.
>
> - bunch of FIXMEs / XXXs

I'll lool more close to the patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2021-03-25 05:06:11 RE: Global snapshots
Previous Message Noah Misch 2021-03-25 04:07:39 Re: Tying an object's ownership to datdba