Re: [PATCH] Hex-coding optimizations using SVE on ARM.

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>
Cc: "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>
Subject: Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date: 2025-01-10 20:46:45
Message-ID: Z4GHNfhRKuA0r_Wn@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote:
> On Fri, Jan 10, 2025 at 11:10:03AM +0000, Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com wrote:
>> We tried auto-vectorization and observed no performance improvement.
>
> Do you mean that the auto-vectorization worked and you observed no
> performance improvement, or the auto-vectorization had no effect on the
> code generated?

I was able to get auto-vectorization to take effect on Apple clang 16 with
the following addition to src/backend/utils/adt/Makefile:

encode.o: CFLAGS += ${CFLAGS_VECTORIZE} -mllvm -force-vector-width=8

This gave the following results with your hex_encode_test() function:

buf | HEAD | patch | % diff
-------+-------+-------+--------
16 | 21 | 16 | 24
64 | 54 | 41 | 24
256 | 138 | 100 | 28
1024 | 441 | 300 | 32
4096 | 1671 | 1106 | 34
16384 | 6890 | 4570 | 34
65536 | 27393 | 18054 | 34

This doesn't compare with the gains you are claiming to see with
intrinsics, but it's not bad for a one line change. I bet there are ways
to adjust the code so that the auto-vectorization is more effective, too.

--
nathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-10 21:37:56 Re: Memory leak in plpython3u (with testcase and patch)
Previous Message m.litsarev 2025-01-10 20:39:07 Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica