Re: AIO writes vs hint bits vs checksums

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: AIO writes vs hint bits vs checksums
Date: 2024-11-19 17:15:58
Message-ID: 76cimgfclkp3obxi5kdfw2dzgdmhkngvkozc3bidhn2s3posyh@4vus26rw65ja
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-11-01 14:10:54 -0400, Andres Freund wrote:
> I still don't like this idea a whole lot - but perhaps we could get reduce the
> overhead of my proposal some, to get closer to yours. When setting hint bits
> for many tuples on a page the overhead of my approach is neglegible, but when doing it
> for individual tuples it's a bit less neglegible.
>
> We can reduce the efficiency difference substantially by adding a bufmgr.c API
> that set hints on a page. That function can set the hint bit while holding the
> buffer header lock, and therefore doesn't need to set BM_SETTING_HINTS and
> thus also doesn't need to do resowner.c accounting.
>
> To see the worst case overhead, I
> a) disabling the "batch" optimization
> b) disabled checksums, as that otherwise would hide small efficiency
> differences
> c) used an unlogged table
>
> and measured the performance difference for a previously-unhinted sequential
> scan of a narrow table that immediately discards all tuples due to OFFSET -
> afaict the worst case for the proposed new behaviour.
>
> Previously this was 30.8% slower than master. Now it's only 1.9% slower.
>
> With the batch optimization enabled, the patchset is 7.5% faster.
>
>
> I also looked at the performance impact on scans that cannot use the batched
> approach. The worst case I could think of was a large ordered indexscan of a
> previously unhinted table.
>
> For an IOS, the performance difference is a slowdown of 0.65%.
>
> But the difference being so small is partially just due to IOS being
> considerably slower than a plain index scan when all tuples need to be fetched
> from the table (we need to address that...). Forcing a non-IOS IOS scan using
> enable_indexonlyscan, I get a slowdown of 5.0%.

The attached patchset implements this approach and a few additional
optimizations:

- The hint bit functions now take care of marking the buffer dirty, this saves
one trip through bufmgr.c and eventually could be used to remove one atomic
operation. It also looks cleaner imo.

- I had not removed the page copying from XLogSaveBufferForHint(), changing
that does make the patchset faster even in the index-scan scan, if a hint
bit log needs to be emitted.

XLogSaveBufferForHint() should probably use XLogRegisterBuffer() instead of
XLogRegisterBlock(), but we'd need to adjust assertions...

- I added a patch to optimize LockBufHdr() by avoiding init_local_spin_delay()
in the common path. With this applied on *both sides*, the regression
vanishes (but see more below).

I've not fully polished the later patches, I would like to get some agreement
on the approach before doing that.

I spent an unhealthily large amouunt of time trying to benchmark this. Largely
because there's some very odd bimodal performance distribution that I can't
figure out. I benchmarked a whole-table indexscan on an unhinted relation. IFF
testing a WAL logged relation, there are two different "centers" on a graph of
latencies of individual benchmark times.

This happens even on master, on a quiesced system, with the database on tmpfs,
turbo mode disabled etc. On two different systems. Unfortunately there are
fairly long stretches of time where I see one duration, which makes it hard to
just tackle this by running the benchmark for long.

I settled for just comparing both the slow and the fast times separately.

Greetings,

Andres Freund

Attachment Content-Type Size
v2-0001-Add-very-basic-test-for-kill_prior_tuples.patch text/x-diff 26.4 KB
v2-0002-heapam-Move-logic-to-handle-HEAP_MOVED-into-a-hel.patch text/x-diff 11.5 KB
v2-0003-bufmgr-Add-BufferLockHeldByMe.patch text/x-diff 2.3 KB
v2-0004-heapam-Use-exclusive-lock-on-old-page-in-CLUSTER.patch text/x-diff 3.2 KB
v2-0005-heapam-Only-set-tuple-s-block-once-per-page-in-pa.patch text/x-diff 1.6 KB
v2-0006-bufmgr-Separate-slow-fast-path-of-LockBufHdr.patch text/x-diff 2.7 KB
v2-0007-heapam-Add-batch-mode-mvcc-check-and-use-it-in-pa.patch text/x-diff 7.9 KB
v2-0008-bufmgr-Make-it-easier-to-change-number-of-buffer-.patch text/x-diff 2.2 KB
v2-0009-bufmgr-Add-interface-to-acquire-right-to-set-hint.patch text/x-diff 20.4 KB
v2-0010-heapam-Acquire-right-to-set-hint-bits.patch text/x-diff 8.1 KB
v2-0011-Acquire-right-to-set-hint-bits-in-the-remaining-p.patch text/x-diff 7.0 KB
v2-0012-bufmgr-Don-t-copy-pages-while-writing-out.patch text/x-diff 10.2 KB
v2-0013-bufmgr-Detect-some-missing-BufferPrepareToSetHint.patch text/x-diff 1.6 KB
v2-0014-WIP-bufmgr-Detect-some-bad-buffer-accesses.patch text/x-diff 14.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2024-11-19 17:30:39 Re: Replace current implementations in crypt() and gen_salt() to OpenSSL
Previous Message Fujii Masao 2024-11-19 17:04:06 Re: Improve error messages for database object stats manipulation functions during recovery