From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com> |
Cc: | "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |
Date: | 2025-01-10 20:46:45 |
Message-ID: | Z4GHNfhRKuA0r_Wn@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote:
> On Fri, Jan 10, 2025 at 11:10:03AM +0000, Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com wrote:
>> We tried auto-vectorization and observed no performance improvement.
>
> Do you mean that the auto-vectorization worked and you observed no
> performance improvement, or the auto-vectorization had no effect on the
> code generated?
I was able to get auto-vectorization to take effect on Apple clang 16 with
the following addition to src/backend/utils/adt/Makefile:
encode.o: CFLAGS += ${CFLAGS_VECTORIZE} -mllvm -force-vector-width=8
This gave the following results with your hex_encode_test() function:
buf | HEAD | patch | % diff
-------+-------+-------+--------
16 | 21 | 16 | 24
64 | 54 | 41 | 24
256 | 138 | 100 | 28
1024 | 441 | 300 | 32
4096 | 1671 | 1106 | 34
16384 | 6890 | 4570 | 34
65536 | 27393 | 18054 | 34
This doesn't compare with the gains you are claiming to see with
intrinsics, but it's not bad for a one line change. I bet there are ways
to adjust the code so that the auto-vectorization is more effective, too.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-01-10 21:37:56 | Re: Memory leak in plpython3u (with testcase and patch) |
Previous Message | m.litsarev | 2025-01-10 20:39:07 | Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica |