From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com> |
Subject: | Re: suspicious valgrind reports about radixtree/tidstore on arm64 |
Date: | 2024-06-20 01:11:37 |
Message-ID: | CAD21AoAjnvXxgoqAnyO4sfxBBxTSzmPz6BLuwWsrZEnLAPgFaQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, Jun 20, 2024 at 7:54 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > I've reproduced what looks like about the same thing on
> > my Pi 4 using Fedora 38: just run "make installcheck-parallel"
> > under valgrind, and kaboom. Definitely needs investigation.
>
> The problem appears to be that RT_ALLOC_NODE doesn't bother to
> initialize the chunks[] array when making a RT_NODE_16 node.
> If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries,
> then when RT_NODE_16_SEARCH_EQ applies vector operations that
> read the entire array, it's operating partially on uninitialized
> data. Now, that's actually OK because of the "mask off invalid
> entries" step, but aarch64 valgrind complains anyway.
>
> I hypothesize that the reason we're not seeing equivalent failures
> on x86_64 is one of
>
> 1. x86_64 valgrind is stupider than aarch64, and fails to track that
> the contents of the SIMD registers are only partially defined.
>
> 2. x86_64 valgrind is smarter than aarch64, and is able to see
> that the "mask off invalid entries" step removes all the
> potentially-uninitialized bits.
>
> The first attached patch, "radixtree-fix-minimal.patch", is enough
> to stop the aarch64 valgrind failure for me. However, I think
> that the coding here is pretty penny-wise and pound-foolish,
> and that what we really ought to do is the second patch,
> "radixtree-fix-proposed.patch". I do not believe that asking
> memset to zero the three-byte RT_NODE structure produces code
> that's either shorter or faster than having it zero 8 bytes
> (as for RT_NODE_4) or having it do that and then separately
> zero some more stuff (as for the larger node types). Leaving
> RT_NODE_4's chunks[] array uninitialized is going to bite us
> someday, too, even if it doesn't right now. So I think we
> ought to just zero the whole fixed-size part of the nodes,
> which is what radixtree-fix-proposed.patch does.
I agree with radixtree-fix-proposed.patch. Even if we zero more fields
in the node it would not add noticeable overheads.
>
> (The RT_NODE_48 case could be optimized a bit if we cared
> to swap the order of its slot_idxs[] and isset[] arrays;
> then the initial zeroing could just go up to slot_idxs[].
> I don't know if there's any reason why the current order
> is preferable.)
IIUC there is no particular reason for the current order in RT_NODE_48.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2024-06-20 01:29:08 | datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE |
Previous Message | Peter Smith | 2024-06-20 01:05:35 | Re: DOCS: Generated table columns are skipped by logical replication |