Re: RFC: Allow EXPLAIN to Output Page Fault Information

From: Jeremy Schneider <schneider(at)ardentperf(dot)com>
To: "Jelte Fennema-Nio" <postgres(at)jeltef(dot)nl>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "torikoshia" <torikoshia(at)oss(dot)nttdata(dot)com>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date: 2024-12-27 18:32:43
Message-ID: 20241227103243.39922304@jeremy-ThinkPad-T430s
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 27 Dec 2024 15:15:40 +0100
"Jelte Fennema-Nio" <postgres(at)jeltef(dot)nl> wrote:

> On Tue Dec 24, 2024 at 4:52 PM CET, Tom Lane wrote:
> > torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> writes:
> >> I have attached a PoC patch that modifies EXPLAIN to include page
> >> fault information during both the planning and execution phases of
> >> a query.
> >
> > Surely these numbers would be too unstable to be worth anything.
>
> What makes you think that? I'd expect them to be similarly stable to
> the numbers we get for BUFFERS. i.e. Sure they won't be completely
> stable, but I expect them to be quite helpful when debugging perf
> issues, because large numbers indicate that the query is disk-bound
> and small numbers indicate that it is not.
>
> These numbers seem especially useful for setups where shared_buffers
> is significantly smaller than the total memory available to the
> system. In those cases the output from BUFFERS might give the
> impression that that you're disk-bound, but if your working set still
> fits into OS cache then the number of page faults is likely still
> low. Thus telling you that the numbers that you get back from BUFFERS
> are not as big of a problem as they might seem.

We'd probably need to combine both pg_buffercache_evict() and
/proc/sys/vm/drop_caches to get stable numbers - which is something I
have done in the past for testing.

Another thought would be splitting out the IO timing information into
two values - IO timing for reads that triggered major faults, versus
IO timing for reads that did not.

And system views like pg_stat_database seem worth considering too.

-Jeremy

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message James Hunter 2024-12-27 19:14:00 Re: Add the ability to limit the amount of memory that can be allocated to backends.
Previous Message Tom Lane 2024-12-27 17:30:51 Re: Connection limits/permissions, slotsync workers, etc