Re: AIO writes vs hint bits vs checksums

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: AIO writes vs hint bits vs checksums
Date: 2024-10-30 02:21:32
Message-ID: lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is a, unfortunately long, series of patches implementing what I
described upthread.

0001 Add very basic test for kill_prior_tuples

We currently don't exercise this patch for gist and hash, which seems
somewhat criminal.

0002 heapam: Move logic to handle HEAP_MOVED into a helper function

A prep patch, which consolidates HEAP_MOVED handling into a helper. This
isn't strictly necessary, but I got bothered by it while writing this patch
series.

0003 bufmgr: Add BufferLockHeldByMe()

Useful for writing assertions in later patches.

0004 heapam: Use exclusive lock on old page in CLUSTER

When I started looking into this I wanted to make sure that it's actually
safe to skip setting hint bits in all places. One place that's *not* safe
is cluster - we assume that all hint bits are set in rewrite_heap_tuple().

This might not strictly be required, but it seemed very fragile as
is. While using an exclusive lock does seem like it makes it a bit safer,
it's not a very good fix, as we still dirty the old buffer, which seems
like a shame when one uses VACUUM FULL.

0005 heapam: Only set tuple's block once per page in pagemode

Minor, but measurable, optimization.

0006 heapam: Add batch mode mvcc check and use it in page mode

This is a good bit faster on its own, but is required to avoid a
performance regression later, when setting hint bits only only when no IO
going on at the same.

0007 bufmgr: Make it easier to change number of buffer state bits

This just makes it easier to change the division of bits in
BufferDesc->state.

0008 bufmgr: Add interface to acquire right to set hint bits

The main change. Adds BufferPrepareToSetHintBits() which, if true is
returned, gives the rigth to modify a buffer without an exclusive
lock. Once the modification is done BufferFinishSetHintBits() needs to be
called.

I dithered a bit on the name, we don't really call them hint bits in all
the places that modify buffers without an exlusive lock.

The new infra is unused as of this commit.

0009 heapam: Acquire right to set hint bits

Moves over heapam to the new interface.

0010 Acquire right to set hint bits in the remaining places

Moves indexams and freespace/ over to the new interface.

0011 bufmgr: Detect some missing BufferPrepareToSetHintBits() calls

Now that we don't set hint bits without calling
BufferPrepareToSetHintBits() first, we can detect many missing calls by
adding an assertion to MarkBufferDirtyHint().

0012 bufmgr: Don't copy pages while writing out

Finally remove the copy of the buffer while writing it out.

0013 WIP: bufmgr: Detect some bad buffer accesses

This is a lot more thorough infrastructure for detecting modifications of
buffers without appropriate locks, by using mprotect() to detect such
cases. Only works when not using huge pages, and not on windows (not sure
how mprotect() is spelled there) and not arm macos (due to 16kB pages).

Has two modes, one just disallows modifications, and is reasonably
cheap. The more expensive mode disallows any access when the buffer is
unpinned. The commit message has some performance numbers.

This has proven to be a good investment as far as this thread goes - I
wasn't initially aware that freespace modified buffers without a lock and
this quickly found that.

Greetings,

Andres Freund

Attachment Content-Type Size
v1-0001-Add-very-basic-test-for-kill_prior_tuples.patch text/x-diff 26.3 KB
v1-0002-heapam-Move-logic-to-handle-HEAP_MOVED-into-a-hel.patch text/x-diff 11.5 KB
v1-0003-bufmgr-Add-BufferLockHeldByMe.patch text/x-diff 2.3 KB
v1-0004-heapam-Use-exclusive-lock-on-old-page-in-CLUSTER.patch text/x-diff 3.2 KB
v1-0005-heapam-Only-set-tuple-s-block-once-per-page-in-pa.patch text/x-diff 1.6 KB
v1-0006-heapam-Add-batch-mode-mvcc-check-and-use-it-in-pa.patch text/x-diff 8.0 KB
v1-0007-bufmgr-Make-it-easier-to-change-number-of-buffer-.patch text/x-diff 2.2 KB
v1-0008-bufmgr-Add-interface-to-acquire-right-to-set-hint.patch text/x-diff 15.4 KB
v1-0009-heapam-Acquire-right-to-set-hint-bits.patch text/x-diff 8.3 KB
v1-0010-Acquire-right-to-set-hint-bits-in-the-remaining-p.patch text/x-diff 6.6 KB
v1-0011-bufmgr-Detect-some-missing-BufferPrepareToSetHint.patch text/x-diff 1.5 KB
v1-0012-bufmgr-Don-t-copy-pages-while-writing-out.patch text/x-diff 8.2 KB
v1-0013-WIP-bufmgr-Detect-some-bad-buffer-accesses.patch text/x-diff 13.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-10-30 02:44:52 Re: Cleanup SubPlanstate
Previous Message jian he 2024-10-30 02:05:00 Re: make all ereport in gram.y print out relative location