Re: Pluggable cumulative statistics

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pluggable cumulative statistics
Date: 2024-07-07 23:14:20
Message-ID: ZoshTO9K7O7Z1wrX@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 07, 2024 at 12:21:26PM +0200, Dmitry Dolgov wrote:
> From what I understand, coordinating custom RmgrIds via a wiki page was
> made under the assumption that implementing a table AM with custom WAL
> requires significant efforts, which limits the demand for ids. This
> might not be same for custom stats -- I've got an impression it's easier
> to create one, and there could be multiple kinds of stats per an
> extension (one per component), right? This would mean more kind Ids to
> manage and more efforts required to do that.

A given module will likely have one single RMGR because it is possible
to divide the RMGR into multiple records. Yes, this cannot really be
said for stats, and a set of stats kinds in one module may want
different kinds because these could have different properties.

My guess is that a combination of one fixed-numbered to track a global
state and one variable-numbered would be the combination most likely
to happen. Also, my impression about pg_stat_statements is that we'd
need this combination, actually, to track the number of entries in a
tighter way because scanning all the partitions of the central dshash
for entries with a specific KindInfo would have a high concurrency
cost.

> I agree though that it makes sense to start this way, it's just simpler.
> But maybe it's worth thinking about some other solution in the long
> term, taking the over-engineered prototype as a sign that more
> refactoring is needed.

The three possible methods I can think of here are, knowing that we
use a central, unique, file to store the stats (per se the arguments
on the redo thread for the stats):
- Store the name of the stats kinds with each entry. This is very
costly with many entries, and complicates the read-write paths because
currently we rely on the KindInfo.
- Store a mapping between the stats kind name and the KindInfo in the
file at write, then use the mapping at read and compare it reassemble
the entries stored. KindInfos are assigned at startup with a unique
counter in shmem. As mentioned upthread, I've implemented something
like that while making the custom stats being registered in the
shmem_startup_hook with requests in shmem_request_hook. That felt
over-engineered considering that the startup process needs to know the
stats kinds very early anyway, so we need _PG_init() and should
encourage its use.
- Fix the KindInfos in time and centralize the values assigned. This
eases the error control and can force the custom stats kinds to be
registered when shared_preload_libraries is loaded. The read is
faster as there is no need to re-check the mapping to reassemble
the stats entries.

At the end, fixing the KindInfos in time is the most reliable method
here (debugging could be slightly easier, less complicated than with
the mapping stored, still doable for all three methods).
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-07-07 23:40:31 Re: Incorrect results from numeric round() and trunc()
Previous Message Fujii.Yuki@df.MitsubishiElectric.co.jp 2024-07-07 21:52:27 RE: Partial aggregates pushdown