From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: AIO writes vs hint bits vs checksums |
Date: | 2024-10-30 11:29:01 |
Message-ID: | 6f921677-570e-492b-95fc-4d5d9b4179f5@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 30/10/2024 04:21, Andres Freund wrote:
> Attached is a, unfortunately long, series of patches implementing what I
> described upthread.
Review of the preparatory patches:
> 0001 Add very basic test for kill_prior_tuples
>
> We currently don't exercise this patch for gist and hash, which seems
> somewhat criminal.
Interesting to use the isolationtester for this. There's just one
session, so you're just using it to define reusable steps with handy
names. I'm fine with that, but please add a comment to explain it. I
wonder if it'd be more straightforward to make it a regular pg_regress
test though. There would be some repetition, but would it be so bad?
You forgot to add the new test to 'isolation_schedule'.
typos:
"inex" -> "index"
"does something approximately reasonble" -> "do something
approximately reasonable"
> 0002 heapam: Move logic to handle HEAP_MOVED into a helper function
>
> A prep patch, which consolidates HEAP_MOVED handling into a helper. This
> isn't strictly necessary, but I got bothered by it while writing this patch
> series.
+1
> +/*
> + * If HEAP_MOVED_OFF or HEAP_MOVED_IN are set on the tuple, remove them and
> + * adjust hint bits. See the comment for SetHintBits() for more background.
> + *
> + * This helper returns false if the row ought to be invisible, true otherwise.
> + */
> +static inline bool
> +HeapTupleCleanMoved(HeapTupleHeader tuple, Buffer buffer)
> +{
> + TransactionId xvac;
> +
> + /* only used by pre-9.0 binary upgrades */
> + if (likely(!(tuple->t_infomask & (HEAP_MOVED_OFF | HEAP_MOVED_IN))))
> + return true;
> +
> + ...
> +}
This is so unlikely these days that this function probably should not be
inlined anymore.
> 0003 bufmgr: Add BufferLockHeldByMe()
>
> Useful for writing assertions in later patches.
> + if (mode == BUFFER_LOCK_EXCLUSIVE)
> + lwmode = LW_EXCLUSIVE;
> + else if (mode == BUFFER_LOCK_SHARE)
> + lwmode = LW_SHARED;
> + else
> + {
> + Assert(false);
> + pg_unreachable();
> + lwmode = LW_EXCLUSIVE; /* assuage compiler */
> + }
I just learned a new word :-).
This is fine, but to avoid the assuaging, you could also do this:
if (mode == BUFFER_LOCK_EXCLUSIVE)
lwmode = LW_EXCLUSIVE;
else
{
Assert(mode == BUFFER_LOCK_SHARE);
lwmode = LW_SHARED;
}
> 0004 heapam: Use exclusive lock on old page in CLUSTER
>
> When I started looking into this I wanted to make sure that it's actually
> safe to skip setting hint bits in all places. One place that's *not* safe
> is cluster - we assume that all hint bits are set in rewrite_heap_tuple().
>
> This might not strictly be required, but it seemed very fragile as
> is. While using an exclusive lock does seem like it makes it a bit safer,
> it's not a very good fix, as we still dirty the old buffer, which seems
> like a shame when one uses VACUUM FULL.
> + /*
> + * To be able to guarantee that we can set the hint bit, acquire an
> + * exclusive lock on the old buffer. We need the hint bits to be set
> + * as otherwise reform_and_rewrite_tuple() -> rewrite_heap_tuple() ->
> + * heap_freeze_tuple() will get confused.
> + *
Hmm, I don't see any comments in the reform_and_rewrite_tuple() ->
rewrite_heap_tuple() -> heap_freeze_tuple() path about requiring hint
bits to be set. How does it get confused?
> 0005 heapam: Only set tuple's block once per page in pagemode
>
> Minor, but measurable, optimization.
+1, this makes sense independently of all the other patches.
> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 1748eafa100..d0cb4b1e29b 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -981,6 +981,9 @@ heapgettup_pagemode(HeapScanDesc scan,
> linesleft = scan->rs_ntuples;
> lineindex = ScanDirectionIsForward(dir) ? 0 : linesleft - 1;
>
> + /* set block once per page, instead of doing so for every tuple */
> + BlockIdSet(&tuple->t_self.ip_blkid, scan->rs_cblock);
> +
> /* lineindex now references the next or previous visible tid */
> continue_page:
>
> @@ -995,7 +998,7 @@ continue_page:
>
> tuple->t_data = (HeapTupleHeader) PageGetItem(page, lpp);
> tuple->t_len = ItemIdGetLength(lpp);
> - ItemPointerSet(&(tuple->t_self), scan->rs_cblock, lineoff);
> + tuple->t_self.ip_posid = lineoff;
>
> /* skip any tuples that don't match the scan key */
> if (key != NULL &&
Should probably use ItemPointerSetBlockNumber and
ItemPointerSetOffsetNumber.
> 0006 heapam: Add batch mode mvcc check and use it in page mode
>
> This is a good bit faster on its own, but is required to avoid a
> performance regression later, when setting hint bits only only when no IO
> going on at the same.
The BATCHMVCC_FEWER_ARGS stuff is pretty weird. Hard to imagine that
it'd make any difference how you pass those arguments.
> + if (likely(valid))
> + {
> + vistuples_dense[nvis] = tup->t_self.ip_posid;
> + nvis++;
> + }
Is the likely(valid) hint important for performance here? It would be a
bad hint for a table with lots of dead tuples. I feel we'd better rely
on the branch predictor for this one.
> 0007 bufmgr: Make it easier to change number of buffer state bits
>
> This just makes it easier to change the division of bits in
> BufferDesc->state.
Looks good to me.
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Michail Nikolaev | 2024-10-30 11:31:29 | Re: Conflict detection for update_deleted in logical replication |
Previous Message | Nikita Malakhov | 2024-10-30 11:22:42 | Re: Avoid detoast overhead when possible |