Re: suspicious valgrind reports about radixtree/tidstore on arm64

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: suspicious valgrind reports about radixtree/tidstore on arm64
Date: 2024-06-20 01:11:37
Message-ID: CAD21AoAjnvXxgoqAnyO4sfxBBxTSzmPz6BLuwWsrZEnLAPgFaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Jun 20, 2024 at 7:54 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > I've reproduced what looks like about the same thing on
> > my Pi 4 using Fedora 38: just run "make installcheck-parallel"
> > under valgrind, and kaboom. Definitely needs investigation.
>
> The problem appears to be that RT_ALLOC_NODE doesn't bother to
> initialize the chunks[] array when making a RT_NODE_16 node.
> If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries,
> then when RT_NODE_16_SEARCH_EQ applies vector operations that
> read the entire array, it's operating partially on uninitialized
> data. Now, that's actually OK because of the "mask off invalid
> entries" step, but aarch64 valgrind complains anyway.
>
> I hypothesize that the reason we're not seeing equivalent failures
> on x86_64 is one of
>
> 1. x86_64 valgrind is stupider than aarch64, and fails to track that
> the contents of the SIMD registers are only partially defined.
>
> 2. x86_64 valgrind is smarter than aarch64, and is able to see
> that the "mask off invalid entries" step removes all the
> potentially-uninitialized bits.
>
> The first attached patch, "radixtree-fix-minimal.patch", is enough
> to stop the aarch64 valgrind failure for me. However, I think
> that the coding here is pretty penny-wise and pound-foolish,
> and that what we really ought to do is the second patch,
> "radixtree-fix-proposed.patch". I do not believe that asking
> memset to zero the three-byte RT_NODE structure produces code
> that's either shorter or faster than having it zero 8 bytes
> (as for RT_NODE_4) or having it do that and then separately
> zero some more stuff (as for the larger node types). Leaving
> RT_NODE_4's chunks[] array uninitialized is going to bite us
> someday, too, even if it doesn't right now. So I think we
> ought to just zero the whole fixed-size part of the nodes,
> which is what radixtree-fix-proposed.patch does.

I agree with radixtree-fix-proposed.patch. Even if we zero more fields
in the node it would not add noticeable overheads.

>
> (The RT_NODE_48 case could be optimized a bit if we cared
> to swap the order of its slot_idxs[] and isset[] arrays;
> then the initial zeroing could just go up to slot_idxs[].
> I don't know if there's any reason why the current order
> is preferable.)

IIUC there is no particular reason for the current order in RT_NODE_48.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-06-20 01:29:08 datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE
Previous Message Peter Smith 2024-06-20 01:05:35 Re: DOCS: Generated table columns are skipped by logical replication