From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me> |
Subject: | Re: Allow io_combine_limit up to 1MB |
Date: | 2025-02-14 17:06:33 |
Message-ID: | wgxyeb5yuyi25itl2oufnvqi3pl763vvhsysrqq6de7vhjyl46@o32rtkfovwsn |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-02-14 09:32:32 +0100, Jakub Wartak wrote:
> On Wed, Feb 12, 2025 at 1:03 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > FWIW, I see substantial performance *regressions* with *big* IO sizes using
> > fio. Just looking at cached buffered IO.
> >
> > for s in 4 8 16 32 64 128 256 512 1024 2048 4096 8192;do echo -ne "$s\t\t"; numactl --physcpubind 3 fio --directory /srv/dev/fio/ --size=32GiB --overwrite 1 --time_based=0 --runtime=10 --name test --rw read --buffered 0 --ioengine psync --buffered 1 --invalidate 0 --output-format json --bs=$((1024*${s})) |jq '.jobs[] | .read.bw_mean';done
> >
> > io size kB throughput in MB/s
> [..]
> > 256 16864
> > 512 19114
> > 1024 12874
> [..]
>
> > It's worth noting that if I boot with mitigations=off clearcpuid=smap I get
> > *vastly* better performance:
> >
> > io size kB throughput in MB/s
> [..]
> > 128 23133
> > 256 23317
> > 512 25829
> > 1024 15912
> [..]
> > Most of the gain isn't due to mitigations=off but clearcpuid=smap. Apparently
> > SMAP, which requires explicit code to allow kernel space to access userspace
> > memory, to make exploitation harder, reacts badly to copying lots of memory.
> >
> > This seems absolutely bonkers to me.
>
> There are two bizarre things there, +35% perf boost just like that due
> to security drama, and that io_size=512kb being so special to give a
> 10-13% boost in Your case? Any ideas, why?
I think there are a few overlapping "cost factors" and that turns out to be
the global minimum:
- syscall overhead: the fewer the better
- memory copy cost: higher for small-ish amounts, then lower
- smap costs: seems to increase with larger amounts of memory
- CPU cache: copying less than L3 cache will be faster, as otherwise memory
bandwidth plays a role
> I've got on that Lsv2
> individual MS nvme under Hyper-V, on ext4, which seems to be much more
> real world and average Joe situation, and it is much slower, but it is
> not showing advantage for blocksize beyond let's say 128:
>
> io size kB throughput in MB/s
> 4 1070
> 8 1117
> 16 1231
> 32 1264
> 64 1249
> 128 1313
> 256 1323
> 512 1257
> 1024 1216
> 2048 1271
> 4096 1304
> 8192 1214
>
> top hitter on of course stuff like clear_page_rep [k] and
> rep_movs_alternative [k] (that was with mitigations=on).
I think you're measuring something different than I was. I was purposefully
measuring a fully-cached workload, which worked with that recipe, because I
have more than 32GB of RAM available. But I assume you're running this in a VM
that doesnt have that much, and thus your're actually bencmarking reading data
from disk and - probably more influential in this case - finding buffers to
put the newly read data in.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-02-14 17:14:20 | Re: BitmapHeapScan streaming read user and prelim refactoring |
Previous Message | Andres Freund | 2025-02-14 16:54:20 | Re: BackgroundPsql swallowing errors on windows |