From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com> |
Cc: | "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>, Salvatore Dipietro <dipiets(at)amazon(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] SVE popcount support |
Date: | 2025-02-05 16:11:05 |
Message-ID: | Z6ONmQVSD5Qnpbsl@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 04, 2025 at 09:01:33AM +0000, Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com wrote:
>> + /*
>> + * For smaller inputs, aligning the buffer degrades the performance.
>> + * Therefore, the buffers only when the input size is sufficiently large.
>> + */
>
>> Is the inverse true, i.e., does aligning the buffer improve performance for
>> larger inputs? I'm also curious what level of performance degradation you
>> were seeing.
>
> Here is a comparison of all three cases. Alignment is marginally better for inputs
> above 1024B, but the difference is small. Unaligned performs better for smaller inputs.
> Aligned After 128B => the current implementation "if (aligned != buf && bytes > 4 * vec_len)"
> Always Aligned => condition "bytes > 4 * vec_len" is removed.
> Unaligned => the whole if block was removed
>
> buf | Always Aligned | Aligned After 128B | Unaligned
> --------+---------------+--------------------+------------
> 16 | 37.851 | 38.203 | 34.971
> 32 | 37.859 | 38.187 | 34.972
> 64 | 37.611 | 37.405 | 34.121
> 128 | 45.357 | 45.897 | 41.890
> 256 | 62.440 | 63.454 | 58.666
> 512 | 100.120 | 102.767 | 99.861
> 1024 | 159.574 | 158.594 | 164.975
> 2048 | 282.354 | 281.198 | 283.937
> 4096 | 532.038 | 531.068 | 533.699
> 8192 | 1038.973 | 1038.083 | 1039.206
> 16384 | 2028.604 | 2025.843 | 2033.940
Hm. These results are so similar that I'm tempted to suggest we just
remove the section of code dedicated to alignment. Is there any reason not
to do that?
+ /* Process 2 complete vectors */
+ for (; i < loop_bytes; i += vec_len * 2)
+ {
+ vec64 = svand_x(pred, svld1(pred, (const uint64 *) (buf + i)), mask64);
+ accum1 = svadd_x(pred, accum1, svcnt_x(pred, vec64));
+ vec64 = svand_x(pred, svld1(pred, (const uint64 *) (buf + i + vec_len)), mask64);
+ accum2 = svadd_x(pred, accum2, svcnt_x(pred, vec64));
+ }
Does this hand-rolled loop unrolling offer any particular advantage? What
do the numbers look like if we don't do this or if we process, say, 4
vectors at a time?
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-02-05 16:26:48 | Re: Better title output for psql \dt \di etc. commands |
Previous Message | Tom Lane | 2025-02-05 16:05:15 | Re: Remove unnecessary static specifier |