Re: doc: explain pgstatindex fragmentation

From: Frédéric Yhuel <frederic(dot)yhuel(at)dalibo(dot)com>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Benoit Lobréau <benoit(dot)lobreau(at)dalibo(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: doc: explain pgstatindex fragmentation
Date: 2025-02-04 21:41:37
Message-ID: 44fac45e-e813-41e0-a0d4-cd3cbb6c48ab@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/27/25 20:56, Ants Aasma wrote:
>> I'll set the patch "ready for committer".

Thanks!

>> I personally would still like to know how fragmentation slows down performance.
> Probable reason is that scanning an unfragmented index results in
> sequential I/O patterns that the kernel read-ahead mechanism detects
> and does prefetching for us. Fragmented index looks like random I/O
> and gets to wait for the full disk latency for every index page.
>
> This is one of the tricky parts to fix for AIO, as directIO will also
> bypass this mechanism. PostgreSQL would need to start issuing those
> prefetches itself to not have a regression there.
>
> In a theoretical world, where we would be able to drive prefetches
> from an inner B-tree page, the difference between fragmented and
> unfragmented indexes would be much less.

Thank you Ants, I think you are right.

I run my test again with kernel readahead disabled (with the blockdev
--setra 0 /dev/sdX command), and I obtained the following numbers with
the unfragmented index:

Buffers: shared hit=1 read=4953
I/O Timings: shared read=252.167

and these with the fragmented one:

Buffers: shared hit=1 read=4904
I/O Timings: shared read=569.341

With kernel readahead enabled, it was:

Buffers: shared hit=1 read=4953
I/O Timings: shared read=18.087

and:

Buffers: shared hit=1 read=4904
I/O Timings: shared read=336.984

I run the tests many times and there is very little variation.

We see that random access benefits a little from kernel readahead, but I
suspect that's because the blocks of the index aren't completely
scattered across the disk.

More interestingly, when kernel readahead is disabled, we see that
scanning the fragmented index still takes twice as long as scanning of
unfragmented one. AFAIK, this is normal for a SSD. Isn't it? (I always
thought that random reads and sequential reads would be almost equally
fast on an SSD, but this does not seem to be the case).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-02-04 21:42:24 Re: should we have a fast-path planning for OLTP starjoins?
Previous Message Thomas Munro 2025-02-04 21:25:35 Re: hash_search_with_hash_value is high in "perf top" on a replica