Quick Links

Re: BitmapHeapScan streaming read user and prelim refactoring

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	Melanie Plageman <melanieplageman(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: BitmapHeapScan streaming read user and prelim refactoring
Date:	2025-02-14 18:44:36
Message-ID:	xif2lgn7obsi5brj7llkzomcia2pn5nwqlyjnkjruknknclbws@vgw2kaldktxw
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2025-02-14 18:18:47 +0100, Tomas Vondra wrote:
> FWIW this does not change anything in the detection of sequential access
> patterns, discussed nearby, because the benchmarks started before Andres
> looked into that. If needed, I can easily rerun these tests, I just need
> a patch to apply.
>
> But if there really is some sort of issue, it'd make sense why it's much
> worse on the older SATA SSDs, while NVMe devices perform somewhat
> better. Because AFAICS the NVMe devices are better at handling random
> I/O with shorter queues.

I think the results are complicated because there are two counteracting
factors influencing performance:

1) read stream doing larger reads -> considerably faster
2) read stream not doing prefetching -> more IO stalls

1) will be a a bigger boon on disks where you're not bottlenecked as much by
interface limits. Whereas SATA is limited to ~500MB/s, NVMe started out at
3GB/s. So this gain will matter more on NVMes.

At least on my machine 2) is what causes CPU idle states to kick in, which is
what causes a good bit of the slowdown. How expensive the idle states are,
how quickly they kick in, etc seems to depend a lot on CPU model, bios
settings and "platform settings" (mainboard manufacturer settings).

The worse a disk is at random IO, the longer the stalls are (adding time), the
deeper idle state can be reached (further increasing latency). I.e. SATA will
be worse.

It might be interesting to run the benchmark with cpu idle stats disabled, at
least on the subset of cores you run the test on. E.g.
cpupower -c 13 idle-set -D1
will disable idle states that have a transition time worse than 1us for core
13.

Sometimes disabling idle states for all cores will have deliterious effects,
due to reducing the thermal budget for turbo boost. E.g. on my older
workstation a core can boost to 3.4GHz if the whole system is at -E and only
3GHz at -D0.

Instead of disabling idle states, you could also just monitor them (cpupower
monitor <benchmark> or turbostat --quiet <benchmark>).

Greetings,

Andres Freund

In response to

Re: BitmapHeapScan streaming read user and prelim refactoring at 2025-02-14 17:18:47 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-02-14 18:52:19	Re: BackgroundPsql swallowing errors on windows
Previous Message	Melanie Plageman	2025-02-14 18:30:20	Re: Confine vacuum skip logic to lazy_scan_skip