Re: RFC: Allow EXPLAIN to Output Page Fault Information

From: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, rjuju123(at)gmail(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date: 2025-02-09 20:06:02
Message-ID: CAGECzQTm6oEEY4yO_FO0ZBtUuJX+pYiXhw2GPPRMzq_5DP5_fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 9 Feb 2025 at 19:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I'm somewhat against this patch, as it's fairly fundamentally incompatible
> > with AIO. There's no real way to get information in this manner if the IO
> > isn't executed synchronously in process context...

Hmm, I had not considered how this would interact with your AIO work.
I agree that getting this info would be hard/impossible to do
efficiently, when IOs are done by background IO processes that
interleave IOs from different queries. But I'd expect that AIOs that
are done using iouring would be tracked correctly without having to
change this code at all (because I assume those are done from the
query backend process).

One other thought: I think the primary benefit of this feature is
being able to see how many read IOs actually hit the disk, as opposed
to hitting OS page cache. That benefit disappears when using Direct
IO, because then there's no OS page cache.

How many years away do you think that widespread general use of
AIO+Direct IO is, though? I think that for the N years from now until
then, it would be very nice to have this feature to help debug query
performance problems. Then once the numbers become too
inaccurate/useless at some point, we could simply remove them again.

> Even without looking ahead to AIO, there's bgwriter, walwriter, and
> checkpointer processes that all take I/O load away from foreground
> processes. I don't really believe that this will produce useful
> numbers.

The bgwriter, walwriter, and checkpointer should only take away
*write* IOs. For read IOs the numbers should be very accurate and as
explained above read IOs is where I think the primary benefit of this
feature is.

But even for write IOs I think the numbers would be useful when
looking at them with the goal of finding out why a particular query is
slow: If the bgwriter or checkpointer do the writes, then the query
should be roughly as fast as if no writes to the disk had taken place
at all, but if the query process does the writes then those writes are
probably blocking further execution of the query and thus slowing it
down.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Brazeal 2025-02-09 20:17:27 Re: Confusing variable naming in LWLockRelease
Previous Message Alena Rybakina 2025-02-09 19:47:26 Re: Replace IN VALUES with ANY in WHERE clauses during optimization