Re: per backend I/O statistics

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: per backend I/O statistics
Date: 2024-11-04 10:01:50
Message-ID: ZyibjiLgoLx+kS4n@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Oct 31, 2024 at 05:09:56AM +0000, Bertrand Drouvot wrote:
> === OPTIONS ===
>
> So, based on this, I think that we could:
>
> Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this
> KIND take care of the aggregated view (pg_stat_io) and the per-backend stats.
>
> Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO
> that would be variable-numbered.
>
> Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that
> would be variable-numbered and store the "aggregated stats aka pg_stat_io" in
> shared memory (not part of the variable-numbered hash). Per-backend stats
> could be aggregated into "pg_stat_io" during the flush_pending_cb call for example.
>
> === BEST OPTION? ===
>
> I would opt for Option 2 as:
>
> - The stats system is currently not designed for Option 1 and our goals (for
> example the shared_data_len is used to serialize but also to fetch the entries,
> see pgstat_fetch_entry()) so that would need some hack to serialize only a part
> of them and still be able to fetch them all).
>
> - Mixing "fixed" and "variable" in the same KIND does not sound like a good idea
> (though that might be possible with some hacks, I don't think that would be
> easy to maintain).
>
> - Having the per-backend as "variable" in its dedicated kind looks more reasonable
> and less error-prone.
>
> - I don't think there is a stats design similar to option 3 currently, so I'm
> not sure there is a need to develop something new while Option 2 could be done.
>
> - Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization
> part.
>
> - Option 2 seems to offer more flexibility (as compare to Option 1 and 3).
>
> Thoughts?

And why not add more per-backend stats in the future? (once the I/O part is done).

I think that's one more reason to go with option 2 (and implementing a brand new
PGSTAT_KIND_BACKEND kind).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-11-04 10:07:37 Re: Clear padding in PgStat_HashKey keys
Previous Message Amit Kapila 2024-11-04 10:01:09 Re: Pgoutput not capturing the generated columns