From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Amonson, Paul D" <paul(dot)d(dot)amonson(at)intel(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Popcount optimization using AVX512 |
Date: | 2024-04-02 15:53:01 |
Message-ID: | 20240402155301.GA2750455@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 01, 2024 at 05:11:17PM -0500, Nathan Bossart wrote:
> Here is a v19 of the patch set. I moved out the refactoring of the
> function pointer selection code to 0001. I think this is a good change
> independent of $SUBJECT, and I plan to commit this soon. In 0002, I
> changed the syslogger.c usage of pg_popcount() to use pg_number_of_ones
> instead. This is standard practice elsewhere where the popcount functions
> are unlikely to win. I'll probably commit this one soon, too, as it's even
> more trivial than 0001.
>
> 0003 is the AVX512 POPCNT patch. Besides refactoring out 0001, there are
> no changes from v18. 0004 is an early proof-of-concept for using AVX512
> for the visibility map code. The code is missing comments, and I haven't
> performed any benchmarking yet, but I figured I'd post it because it
> demonstrates how it's possible to build upon 0003 in other areas.
I've committed the first two patches, and I've attached a rebased version
of the latter two.
> AFAICT the main open question is the function call overhead in 0003 that
> Alvaro brought up earlier. After 0002 is committed, I believe the only
> in-tree caller of pg_popcount() with very few bytes is bit_count(), and I'm
> not sure it's worth expending too much energy to make sure there are
> absolutely no regressions there. However, I'm happy to do so if folks feel
> that it is necessary, and I'd be grateful for thoughts on how to proceed on
> this one.
Another idea I had is to turn pg_popcount() into a macro that just uses the
pg_number_of_ones array when called for few bytes:
static inline uint64
pg_popcount_inline(const char *buf, int bytes)
{
uint64 popcnt = 0;
while (bytes--)
popcnt += pg_number_of_ones[(unsigned char) *buf++];
return popcnt;
}
#define pg_popcount(buf, bytes) \
((bytes < 64) ? \
pg_popcount_inline(buf, bytes) : \
pg_popcount_optimized(buf, bytes))
But again, I'm not sure this is really worth it for the current use-cases.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v20-0001-AVX512-popcount-support.patch | text/x-diff | 28.7 KB |
v20-0002-optimize-visibilitymap_count-with-AVX512.patch | text/x-diff | 9.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-04-02 15:54:24 | Re: On disable_cost |
Previous Message | Tom Lane | 2024-04-02 15:47:28 | Re: Detoasting optionally to make Explain-Analyze less misleading |