Re: per backend I/O statistics

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: per backend I/O statistics
Date: 2025-01-16 00:55:10
Message-ID: Z4hY7pKjgLP5jrnX@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 15, 2025 at 05:20:57PM +0300, Nazir Bilal Yavuz wrote:
> I think allowing only pgStatPendingContext to have
> MemoryContextAllowInCriticalSection() is not enough. We need to allow
> at least pgStatSharedRefContext as well to have
> MemoryContextAllowInCriticalSection() as it can be allocated too.
>
> '''
> pgstat_prep_pending_entry() ->
> pgstat_get_entry_ref() ->
> pgstat_get_entry_ref_cached() ->
> MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef))
> '''

Yep, I was pretty sure that we have a bit more going on. Last time I
began digging into the issue I was loading injection_points in
shared_preload_libraries with stats enabled to see how much I could
break, and this was one pattern once I've forced the pending part. I
didn't get through the whole exercise.

> While doing the initdb, we are restoring stats with the
> pgstat_restore_stats() and we do not expect any pending stats. The
> problem goes like that:
>
> '''
> pgstat_restore_stats() ->
> pgstat_read_statsfile() ->
> pgstat_reset_after_failure() ->
> pgstat_drop_all_entries() ->
> pgstat_drop_entry_internal() ->
> We have an assertion there which checks if there is a pending stat entry:
>
> /* should already have released local reference */
> if (pgStatEntryRefHash)
> Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key));
> '''

You mean that this is not something that happens on HEAD, but as an
effect of trying to add WAL stats to pg_stat_io, right?

> I was a bit surprised that Bertrand did not encounter the same problem
> while working on the 'per backend WAL statistics' patch. Then I found
> the reason, it is because this problem happens only for WAL read and
> WAL init IOs which are starting to be tracked in my patch. By saying
> that, I could not decide which thread to write about this problem,
> migrating WAL stats thread or this thread. Since this thread is active
> and other stats may cause the same problem, I decided to write here.
> Please warn me if you think I need to write this to the migrating WAL
> stats thread.

I'd rather treat that on a separate thread (please add me in CC if I'm
not there yet!). I am OK to discuss any issues related to backend
statistics here if HEAD is impacted based on how the feature is
limited currently now, though.

Another, simpler, idea would be to force more calls
pgstat_prep_backend_pending() where this is problematic, like before
entering a critical section when generating a WAL record. I am not
sure that it would take care of everything, as you mention; that would
leave some holes depending on the kind of interactions with the
pgstats dshash we do in these paths.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2025-01-16 01:00:51 Re: An improvement of ProcessTwoPhaseBuffer logic
Previous Message Peter Geoghegan 2025-01-16 00:51:39 Re: Having problems generating a code coverage report