From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: AIO writes vs hint bits vs checksums |
Date: | 2024-11-19 17:15:58 |
Message-ID: | 76cimgfclkp3obxi5kdfw2dzgdmhkngvkozc3bidhn2s3posyh@4vus26rw65ja |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2024-11-01 14:10:54 -0400, Andres Freund wrote:
> I still don't like this idea a whole lot - but perhaps we could get reduce the
> overhead of my proposal some, to get closer to yours. When setting hint bits
> for many tuples on a page the overhead of my approach is neglegible, but when doing it
> for individual tuples it's a bit less neglegible.
>
> We can reduce the efficiency difference substantially by adding a bufmgr.c API
> that set hints on a page. That function can set the hint bit while holding the
> buffer header lock, and therefore doesn't need to set BM_SETTING_HINTS and
> thus also doesn't need to do resowner.c accounting.
>
> To see the worst case overhead, I
> a) disabling the "batch" optimization
> b) disabled checksums, as that otherwise would hide small efficiency
> differences
> c) used an unlogged table
>
> and measured the performance difference for a previously-unhinted sequential
> scan of a narrow table that immediately discards all tuples due to OFFSET -
> afaict the worst case for the proposed new behaviour.
>
> Previously this was 30.8% slower than master. Now it's only 1.9% slower.
>
> With the batch optimization enabled, the patchset is 7.5% faster.
>
>
> I also looked at the performance impact on scans that cannot use the batched
> approach. The worst case I could think of was a large ordered indexscan of a
> previously unhinted table.
>
> For an IOS, the performance difference is a slowdown of 0.65%.
>
> But the difference being so small is partially just due to IOS being
> considerably slower than a plain index scan when all tuples need to be fetched
> from the table (we need to address that...). Forcing a non-IOS IOS scan using
> enable_indexonlyscan, I get a slowdown of 5.0%.
The attached patchset implements this approach and a few additional
optimizations:
- The hint bit functions now take care of marking the buffer dirty, this saves
one trip through bufmgr.c and eventually could be used to remove one atomic
operation. It also looks cleaner imo.
- I had not removed the page copying from XLogSaveBufferForHint(), changing
that does make the patchset faster even in the index-scan scan, if a hint
bit log needs to be emitted.
XLogSaveBufferForHint() should probably use XLogRegisterBuffer() instead of
XLogRegisterBlock(), but we'd need to adjust assertions...
- I added a patch to optimize LockBufHdr() by avoiding init_local_spin_delay()
in the common path. With this applied on *both sides*, the regression
vanishes (but see more below).
I've not fully polished the later patches, I would like to get some agreement
on the approach before doing that.
I spent an unhealthily large amouunt of time trying to benchmark this. Largely
because there's some very odd bimodal performance distribution that I can't
figure out. I benchmarked a whole-table indexscan on an unhinted relation. IFF
testing a WAL logged relation, there are two different "centers" on a graph of
latencies of individual benchmark times.
This happens even on master, on a quiesced system, with the database on tmpfs,
turbo mode disabled etc. On two different systems. Unfortunately there are
fairly long stretches of time where I see one duration, which makes it hard to
just tackle this by running the benchmark for long.
I settled for just comparing both the slow and the fast times separately.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2024-11-19 17:30:39 | Re: Replace current implementations in crypt() and gen_salt() to OpenSSL |
Previous Message | Fujii Masao | 2024-11-19 17:04:06 | Re: Improve error messages for database object stats manipulation functions during recovery |