From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | pgsql-hackers(at)postgresql(dot)org, rjuju123(at)gmail(dot)com, Bruce Momjian <bruce(at)momjian(dot)us> |
Subject: | Re: RFC: Allow EXPLAIN to Output Page Fault Information |
Date: | 2025-02-10 13:23:06 |
Message-ID: | 30ce8c5be7d1f67d53276540c29fa1c7@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2025-02-10 05:06, Jelte Fennema-Nio wrote:
Thanks for reviewing the patch and comments!
Fixed issues you pointed out and attached v2 patch.
> On Sun, 9 Feb 2025 at 19:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>> > I'm somewhat against this patch, as it's fairly fundamentally incompatible
>> > with AIO. There's no real way to get information in this manner if the IO
>> > isn't executed synchronously in process context...
>
> Hmm, I had not considered how this would interact with your AIO work.
> I agree that getting this info would be hard/impossible to do
> efficiently, when IOs are done by background IO processes that
> interleave IOs from different queries. But I'd expect that AIOs that
> are done using iouring would be tracked correctly without having to
> change this code at all (because I assume those are done from the
> query backend process).
>
> One other thought: I think the primary benefit of this feature is
> being able to see how many read IOs actually hit the disk, as opposed
> to hitting OS page cache. That benefit disappears when using Direct
> IO, because then there's no OS page cache.
>
> How many years away do you think that widespread general use of
> AIO+Direct IO is, though? I think that for the N years from now until
> then, it would be very nice to have this feature to help debug query
> performance problems. Then once the numbers become too
> inaccurate/useless at some point, we could simply remove them again.
AIO efforts are something I haven't fully grasped yet, but Jelte's
comments seem reasonable to me.
Of course, as someone proposing this, I'm naturally biased toward
thinking it’s beneficial.
What do you think?
>> Even without looking ahead to AIO, there's bgwriter, walwriter, and
>> checkpointer processes that all take I/O load away from foreground
>> processes. I don't really believe that this will produce useful
>> numbers.
>
> The bgwriter, walwriter, and checkpointer should only take away
> *write* IOs. For read IOs the numbers should be very accurate and as
> explained above read IOs is where I think the primary benefit of this
> feature is.
>
> But even for write IOs I think the numbers would be useful when
> looking at them with the goal of finding out why a particular query is
> slow: If the bgwriter or checkpointer do the writes, then the query
> should be roughly as fast as if no writes to the disk had taken place
> at all, but if the query process does the writes then those writes are
> probably blocking further execution of the query and thus slowing it
> down.
I agree with this as well.
For example, in a SELECT query executed immediately after a large number
of INSERTs, we can observe that writes to storage occur due to WAL
writes for hint bits.
This makes the query take longer compared to a scenario where these
writes do not occur.
I think we can guess what is happening from the output:
postgres=# insert into t1 (select i, repeat('a', 1000) from
generate_series(1, 1000000) i);
INSERT 0 1000000
postgres=# explain analyze table t1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..382665.25 rows=21409025 width=36) (actual
time=1.926..11035.531 rows=1000100 loops=1)
Buffers: shared read=168575 dirtied=142858 written=142479
Planning:
Buffers: shared hit=3 read=3 written=1
Storage I/O: read=48 times write=16 times
Planning Time: 4.472 ms
Execution:
Storage I/O: read=2697272 times write=4480096 times // many writes
Execution Time: 11099.424 ms // slow
(9 rows)
postgres=# explain analyze table t1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..382665.25 rows=21409025 width=36) (actual
time=2.066..2926.394 rows=1000100 loops=1)
Buffers: shared read=168575 written=14
Planning Time: 0.295 ms
Execution:
Storage I/O: read=2697200 times write=224 times // few writes
Execution Time: 3016.257 ms // fast
(6 rows)
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Add-storage-I-O-tracking-to-BUFFERS-option.patch | text/x-diff | 33.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nisha Moond | 2025-02-10 13:27:25 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | vignesh C | 2025-02-10 12:42:12 | Re: Introduce XID age and inactive timeout based replication slot invalidation |