Re: per backend I/O statistics

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: per backend I/O statistics
Date: 2024-11-14 06:31:51
Message-ID: ZzWZV9LdyZ9aFSWs@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 08, 2024 at 02:09:30PM +0000, Bertrand Drouvot wrote:
> ==== 0001 (the largest one)
>
> Introduces a new statistics KIND. It is named PGSTAT_KIND_PER_BACKEND as it could
> be used in the future to store other statistics (than the I/O ones) per backend.
> The new KIND is a variable-numbered one and has an automatic cap on the
> maximum number of entries (as its hash key contains the proc number).
>
> There is no need to serialize the per backend I/O stats to disk (no point to
> see stats for backends that do not exist anymore after a re-start), so a
> new "to_serialize" field is added in the PgStat_KindInfo struct.
>
> It adds a new pg_my_stat_io view to display "my" backend I/O statistics.
>
> It also adds a new function pg_stat_reset_single_backend_io_counters() to be
> able to reset the I/O stats for a given backend pid.
>
> It contains doc updates and dedicated tests.
>
> A few remarks:
>
> 1. one assertion in pgstat_drop_entry_internal() is not necessary true anymore
> with this new stat kind. So, adding an extra bool as parameter to take care of it.

Why? I have to admit that the addition of this argument at this level
specific to a new stats kind feels really weird and it looks like a
layer violation, while this happens only when resetting the whole
pgstats state because we have loaded a portion of the stats but the
file loaded was in such a state that we don't have a consistent
picture.

> 2. a new struct "PgStat_Backend" is created and does contain the backend type.
> The backend type is used for filtering purpose when the stats are displayed.

Is that necessary? We can guess that from the PID with a join in
pg_stat_activity. pg_stat_get_backend_io() from 0004 returns a set of
tuples for a specific PID, as well..

+ /* save the bktype */
+ if (kind == PGSTAT_KIND_PER_BACKEND)
+ bktype = ((PgStatShared_Backend *) header)->stats.bktype;
[...]
+ /* restore the bktype */
+ if (kind == PGSTAT_KIND_PER_BACKEND)
+ ((PgStatShared_Backend *) header)->stats.bktype = bktype;

Including the backend type results in these blips in
shared_stat_reset_contents() which should not have anything related
to stats kinds and should remain neutral, as well.

pgstat_prep_per_backend_pending() is used only in pgstat_io.c, so I
guess that it should be static in this file rather than declared in
pgstat.h?

+typedef struct PgStat_PendingIO

Perhaps this part should use a separate structure named
"BackendPendingIO"? The definition of the structure has to be in
pgstat.h as this is the pending_size of the new stats kind. It looks
like it would be cleaner to keep PgStat_PendingIO local to
pgstat_io.c, and define PgStat_PendingIO based on
PgStat_BackendPendingIO?

+ /*
+ * Do serialize or not this kind of stats.
+ */
+ bool to_serialize:1;

Not sure that "serialize" is the best term that applies here. For
pgstats entries, serialization refers to the matter of writing their
entries with a "serialized" name because they have an undefined number
when stored locally after a reload. I'd suggest to split this concept
into its own patch, rename the flag as "write_to_file" (or what you
think is suited), and also apply the flag in the fixed-numbered loop
done in pgstat_write_statsfile() before going through the dshash.

+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_reset_single_backend_io_counters</primary>
+ </indexterm>
+ <function>pg_stat_reset_single_backend_io_counters</function> ( <type>int4</type> )
+ <returnvalue>void</returnvalue>

This should document that the input argument is a PID.

Is pg_my_stat_io() the best name ever? I'd suggest to just merge 0004
with 0001.

Structurally, it may be cleaner to keep all the callbacks and the
backend-I/O specific logic into a separate file, perhaps
pgstat_io_backend.c or pgstat_backend_io?

> ==== 0002
>
> Merge both IO stats flush callback. There is no need to keep both callbacks.
>
> Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
> IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.

Not sure to be a fan of that, TBH, still I get the performance
argument of the matter. Each stats kind has its own flush path, and
this assumes that we'll never going to diverge in terms of the
information maintained for each backend and the existing pg_stat_io.
Perhaps there could be a divergence at some point?

> === Remarks
>
> R1. The key field is still "objid" even if it stores the proc number. I think
> that's fine and there is no need to rename it as it's related comment states:

Let's not change the object ID and the hash key. The approach you are
using here with a variable-numbered stats kinds and a lookup with the
procnumber is sensible here.

> R2. pg_stat_get_io() and pg_stat_get_backend_io() could probably be merged (
> duplicated code). That's not mandatory to provide the new per backend I/O stats
> feature. So let's keep it as future work to ease the review.

Not sure about that.

> R3. There is no use case of retrieving other backend's IO stats with
> stats_fetch_consistency set to 'snapshot'. Avoiding this behavior allows us to
> save memory (that could be non negligeable as pgstat_build_snapshot() would add
> _all_ the other backend's stats in the snapshot). I choose to document it and to
> not return any rows: I think that's better than returning some data (that would
> not "ensure" the snapshot consistency) or an error.

I'm pretty sure that there is a sensible argument about being able to
get a snapshot of the I/O stat of another backend and consider that a
feature. For example, keeping a look at the activity of the
checkpointer while doing a scan at a certain point in time? With a
GUC, we could have both behaviors and control which one we want.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-14 06:34:11 Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Previous Message Nikita Malakhov 2024-11-14 06:19:45 Re: remaining sql/json patches