Quick Links

Re: RFC: Allow EXPLAIN to Output Page Fault Information

From:	torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To:	Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	pgsql-hackers(at)postgresql(dot)org, rjuju123(at)gmail(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>
Subject:	Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date:	2025-02-10 13:23:06
Message-ID:	30ce8c5be7d1f67d53276540c29fa1c7@oss.nttdata.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2025-02-10 05:06, Jelte Fennema-Nio wrote:

Thanks for reviewing the patch and comments!
Fixed issues you pointed out and attached v2 patch.

> On Sun, 9 Feb 2025 at 19:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>> > I'm somewhat against this patch, as it's fairly fundamentally incompatible
>> > with AIO. There's no real way to get information in this manner if the IO
>> > isn't executed synchronously in process context...
>
> Hmm, I had not considered how this would interact with your AIO work.
> I agree that getting this info would be hard/impossible to do
> efficiently, when IOs are done by background IO processes that
> interleave IOs from different queries. But I'd expect that AIOs that
> are done using iouring would be tracked correctly without having to
> change this code at all (because I assume those are done from the
> query backend process).
>
> One other thought: I think the primary benefit of this feature is
> being able to see how many read IOs actually hit the disk, as opposed
> to hitting OS page cache. That benefit disappears when using Direct
> IO, because then there's no OS page cache.
>
> How many years away do you think that widespread general use of
> AIO+Direct IO is, though? I think that for the N years from now until
> then, it would be very nice to have this feature to help debug query
> performance problems. Then once the numbers become too
> inaccurate/useless at some point, we could simply remove them again.

AIO efforts are something I haven't fully grasped yet, but Jelte's
comments seem reasonable to me.
Of course, as someone proposing this, I'm naturally biased toward
thinking it’s beneficial.
What do you think?

>> Even without looking ahead to AIO, there's bgwriter, walwriter, and
>> checkpointer processes that all take I/O load away from foreground
>> processes. I don't really believe that this will produce useful
>> numbers.
>
> The bgwriter, walwriter, and checkpointer should only take away
> *write* IOs. For read IOs the numbers should be very accurate and as
> explained above read IOs is where I think the primary benefit of this
> feature is.
>
> But even for write IOs I think the numbers would be useful when
> looking at them with the goal of finding out why a particular query is
> slow: If the bgwriter or checkpointer do the writes, then the query
> should be roughly as fast as if no writes to the disk had taken place
> at all, but if the query process does the writes then those writes are
> probably blocking further execution of the query and thus slowing it
> down.

I agree with this as well.

For example, in a SELECT query executed immediately after a large number
of INSERTs, we can observe that writes to storage occur due to WAL
writes for hint bits.
This makes the query take longer compared to a scenario where these
writes do not occur.
I think we can guess what is happening from the output:

postgres=# insert into t1 (select i, repeat('a', 1000) from
generate_series(1, 1000000) i);
INSERT 0 1000000

postgres=# explain analyze table t1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..382665.25 rows=21409025 width=36) (actual
time=1.926..11035.531 rows=1000100 loops=1)
Buffers: shared read=168575 dirtied=142858 written=142479
Planning:
Buffers: shared hit=3 read=3 written=1
Storage I/O: read=48 times write=16 times
Planning Time: 4.472 ms
Execution:
Storage I/O: read=2697272 times write=4480096 times // many writes
Execution Time: 11099.424 ms // slow
(9 rows)

postgres=# explain analyze table t1;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..382665.25 rows=21409025 width=36) (actual
time=2.066..2926.394 rows=1000100 loops=1)
Buffers: shared read=168575 written=14
Planning Time: 0.295 ms
Execution:
Storage I/O: read=2697200 times write=224 times // few writes
Execution Time: 3016.257 ms // fast
(6 rows)

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachment	Content-Type	Size
v2-0001-Add-storage-I-O-tracking-to-BUFFERS-option.patch	text/x-diff	33.9 KB

In response to

Re: RFC: Allow EXPLAIN to Output Page Fault Information at 2025-02-09 20:06:02 from Jelte Fennema-Nio

Responses

Re: RFC: Allow EXPLAIN to Output Page Fault Information at 2025-03-17 23:52:05 from Jelte Fennema-Nio

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nisha Moond	2025-02-10 13:27:25	Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message	vignesh C	2025-02-10 12:42:12	Re: Introduce XID age and inactive timeout based replication slot invalidation