Quick Links

Block / Page Size Optimization

From:	Gunther <raj(at)gusw(dot)net>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Block / Page Size Optimization
Date:	2019-04-08 15:09:07
Message-ID:	3c840f8b-73f0-aae7-6bcf-e22d2a0a6a40@gusw.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi all, I am sure this should be a FAQ, but I can't see a definitive
answer, only chatter on various lists and forums.

Default page size of PostgreSQL is 8192 bytes.

Default IO block size in Linux is 4096 bytes.

I can set an XFS file system with 8192 bytes block size, but then it
does not mount on Linux, because the VM page size is the limit, 4096 again.

There seems to be no way to change that in (most, common) Linux
variants. In FreeBSD there appears to be a way to change that.

But then, there is a hardware limit also, as far as the VM memory page
allocation is concerned. Apparently most i386 / amd64 architectures the
VM page sizes are 4k, 2M, and 1G. The latter, I believe, are called
"hugepages" and I only ever see that discussed in the PostgreSQL manuals
for Linux, not for FreeBSD.

People have asked: does it matter? And then there is all that chatter
about "why don't you run a benchmark and report back to us" -- "OK, will
do" -- and then it's crickets.

But why is this such a secret?

On Amazon AWS there is the following very simple situation: IO is capped
on IO operations per second (IOPS). Let's say, on a smallish volume, I
get 300 IOPS (once my burst balance is used up.)

Now my simple theoretical reasoning is this: one IO call transfers 1
block of 4k size. That means, with a cap of 300 IOPS, I get to send 1.17
MB per second. That would be the absolute limit. BUT, if I could double
the transfer size to 8k, I should be able to move 2.34 MB per second.
Shouldn't I?

That might well depend on whether AWS' virtual device paths would
support these 8k block sizes.

But something tells me that my reasoning here is totally off. Because I
get better IO throughput that that. Even on 3000 IOPS I would only get
11 MB per second, and I am sure I am getting rather 50-100 MB/s, no? So
my simplistic logic is false.

What really is the theoretical issue with the file system block size?
Where does -- in theory -- the benefit come from of using an XFS block
size of 8 kB, or even increasing the PostgreSQL page size to 16 kB and
then the XFS block size also to 16 kB? I remember having seen standard
UFS block sizes of 16 kB. But then why is Linux so tough on refusing to
mount an 8 kB XFS because it's VM page size is only 4 kB?

Doesn't this all have one straight explanation?

If you have a link that I can just read, I appreciate you sharing that.
I think that should be on some Wiki or FAQ somewhere. If I get a quick
and dirty explanation with some pointers, I can try to write it out into
a more complete answer that might be added into some documentation or
FAQ somewhere.

thanks & regards,
-Gunther

Responses

Re: Block / Page Size Optimization at 2019-04-08 16:28:46 from Andres Freund
Re: Block / Page Size Optimization at 2019-04-15 16:19:06 from Tomas Vondra

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andres Freund	2019-04-08 16:28:46	Re: Block / Page Size Optimization
Previous Message	Pavel Stehule	2019-04-08 15:07:04	Re: Planning performance problem (67626.278ms)