Re: RFC: Allow EXPLAIN to Output Page Fault Information

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, rjuju123(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date: 2025-01-27 09:04:59
Message-ID: 2c9d6eaf26df17bec13bb03bf1e9bcbb@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jan 7, 2025 at 5:09 AM Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
wrote:
> The core functionality works well in my opinion. I think it makes sense
> to spend the effort to move this from PoC quality to something
> committable. Below some of the things that are necessary to do that
> after an initial pass over the code (and trying it out):

Attached a Patch.

On Tue, Jan 7, 2025 at 10:42 PM Atsushi Torikoshi
<torikoshia(dot)tech(at)gmail(dot)com> wrote:
>> 9. I think this division by 2 could use some explanation in a comment.
>> I
>> understand that you're doing this because linux divides its
>> original
>> bytes using 512 bytes[2] and your additional factor of 2 gets that
>> to
>> 1024 bytes. But that's not clear immediately from the code.
>>
>> I'm also not convinced that 512 is the blocksize if this logic is
>> even correct on every platform. I'm wondering if maybe we should
>> simply show the blocks after all.
>
> Maybe so. I'll look into this and then decide the unit.

I looked up the manuals for the following operating systems, as
documented in [1], and it seems that all of them—except Windows—support
getrusage(2) and return ru_inblock/ru_oublock:
Linux, Windows, FreeBSD, OpenBSD, NetBSD, DragonFlyBSD, macOS, AIX,
Solaris, and illumos.

However, I’m unsure if the unit of these values is consistently 512KB
across all operating systems.
Additionally, I’m concerned that the timing of when these metrics are
incremented might vary between OSs.
For example, on Linux, it seems that ru_oublock is incremented when a
page is dirtied, as it’s calculated by dividing write_bytes [2] by 9.
I’m not sure if other operating systems behave the same way.

That said, they all represent the number of storage I/O operations
performed or to be performed.
Therefore, I believe it would be reasonable to report the raw values
as-is, as they should still be useful for understanding storage I/O
activity.

Example output:
=# explain analyze select max(a), max(b) from t_big_ul;
(..snip..)
Planning:
Buffers: shared hit=31 read=54
Storage I/O: read=2744 times write=0 times
Planning Time: 30.685 ms
Execution:
Storage I/O: read=2563600 times write=0 times
Execution Time: 1685.272 ms

[1] https://www.postgresql.org/docs/devel/supported-platforms.html
[2] https://www.kernel.org/doc/Documentation/filesystems/proc.txt

--
Regards,

Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachment Content-Type Size
v1-0001-Add-storage-I-O-tracking-to-BUFFERS-option.patch text/x-diff 33.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2025-01-27 09:05:54 Re: New process of getting changes into the commitfest app
Previous Message Jelte Fennema-Nio 2025-01-27 09:03:07 Re: New process of getting changes into the commitfest app