Re: per backend I/O statistics

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: per backend I/O statistics
Date: 2025-01-15 14:20:57
Message-ID: CAN55FZ1uOq=FVJObp0bdj-Z8q1ZRNmA-RymPqbMD+p4QaHXP3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, 15 Jan 2025 at 12:27, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Jan 15, 2025 at 08:30:20AM +0000, Bertrand Drouvot wrote:
> > On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote:
> >> With this commit it may not be possible to count IOs in the critical
> >> sections. I think the problem happens only if the local
> >> PgStat_BackendPending entry is being created for the first time for
> >> this backend in the critical section.
> >
> > Yeah, I encountered the exact same thing and mentioned it in [1] (see R1.).
> >
> > In [1] I did propose to use a new PendingBackendWalStats variable to "bypass" the
> > pgstat_prep_backend_pending() usage.
> >
> > Michael mentioned in [2] that is not really consistent with the rest (what I
> > agree with) and that "we should rethink a bit the way pending entries are
> > retrieved". I did not think about it yet but that might be the way to
> > go, thoughts?
> >
> > [1]: https://www.postgresql.org/message-id/Z3zqc4o09dM/Ezyz%40ip-10-97-1-34.eu-west-3.compute.internal
> > [2]: https://www.postgresql.org/message-id/Z4dRlNuhSQ3hPPv2%40paquier.xyz

Thanks! I must have missed these emails.

>
> My problem is that this is not only related to backend stats, but to
> all variable-numbered stats kinds that require this behavior. One
> other case where I've seen this as being an issue is injection point
> stats, for example, while enabling these stats for all callbacks and
> some of them are run in critical sections.
>
> A generic solution to the problem would be to allow
> pgStatPendingContext to have MemoryContextAllowInCriticalSection() set
> temporarily for the allocation of the pending data.
> Perhaps not for all stats kinds, just for these where we're OK with
> the potential memory footprint so we could have a flag in
> PgStat_KindInfo. Giving to all stats kinds the responsibility to
> allocate a pending entry beforehand outside a critical section is
> another option, but that means going through different tweaks for all
> stats kinds that require them.

I think allowing only pgStatPendingContext to have
MemoryContextAllowInCriticalSection() is not enough. We need to allow
at least pgStatSharedRefContext as well to have
MemoryContextAllowInCriticalSection() as it can be allocated too.

'''
pgstat_prep_pending_entry() ->
pgstat_get_entry_ref() ->
pgstat_get_entry_ref_cached() ->
MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef))
'''

Also, I encountered another problem. I did not write it in the first
email because I thought I fixed it but it seems I did not.

While doing the initdb, we are restoring stats with the
pgstat_restore_stats() and we do not expect any pending stats. The
problem goes like that:

'''
pgstat_restore_stats() ->
pgstat_read_statsfile() ->
pgstat_reset_after_failure() ->
pgstat_drop_all_entries() ->
pgstat_drop_entry_internal() ->
We have an assertion there which checks if there is a pending stat entry:

/* should already have released local reference */
if (pgStatEntryRefHash)
Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key));
'''

And we have this at the same time:

'''
BootstrapModeMain() ->
InitPostgres() ->
StartupXLOG() ->
ReadCheckpointRecord() ->
InitWalRecovery() ->
... ->
XLogReadAhead() ->
XLogDecodeNextRecord() ->
ReadPageInternal() ->
state->routine.page_read = XLogPageRead() then WAL read happens
'''

So, this assert fails because we have pending stats for the
PGSTAT_KIND_BACKEND. My fix was simply not restoring stats when the
bootstrap is happening; but it did not work. Because, we reset all
fixed-numbered stats 'pgstat_reset_after_failure() ->
kind_info->reset_all_cb()' there, so if we skip it; we have some NULL
stats, for example reset timestamps. Some of the tests do not expect
them to be NULL at the start like below:

SELECT stats_reset AS archiver_reset_ts FROM pg_stat_archiver \gset
SELECT pg_stat_reset_shared('archiver');
SELECT stats_reset > :'archiver_reset_ts'::timestamptz FROM pg_stat_archiver;

This test fails because archiver_reset_ts is NULL when the server is started.

I was a bit surprised that Bertrand did not encounter the same problem
while working on the 'per backend WAL statistics' patch. Then I found
the reason, it is because this problem happens only for WAL read and
WAL init IOs which are starting to be tracked in my patch. By saying
that, I could not decide which thread to write about this problem,
migrating WAL stats thread or this thread. Since this thread is active
and other stats may cause the same problem, I decided to write here.
Please warn me if you think I need to write this to the migrating WAL
stats thread.

[1]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index bf3dbda901d..0538859cc7f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5681,7 +5681,7 @@ StartupXLOG(void)
*/
if (didCrash)
pgstat_discard_stats();
- else
+ else if (!IsBootstrapProcessingMode())
pgstat_restore_stats(checkPoint.redo);

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-01-15 14:24:52 Re: Replace current implementations in crypt() and gen_salt() to OpenSSL
Previous Message Dean Rasheed 2025-01-15 14:12:52 Re: Virtual generated columns