From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Use POPCNT on MSVC |
Date: | 2021-08-04 01:05:06 |
Message-ID: | CA+hUKG+JEJyRmeC_f6-j6kyLWONzTU-rMu2f=0oQJFZ8mp=jbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 3, 2021 at 10:43 PM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> (Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even though they have no fast implementation to make it worth their while. It would be better for them if the "slow" implementation was called static inline or at least a direct function call, but that's a separate thread.)
+1
I haven't looked into whether we could benefit from it in real use
cases, but it seems like it'd also be nice if pg_popcount() were a
candidate for auto-vectorisation and inlining. For example, NEON has
vector popcount, and for Intel/AMD there is a shuffle-based AVX2 trick
that at least Clang produces automatically[1]. We're obstructing that
by doing function dispatch at individual word level, and using inline
assembler instead of builtins.
From | Date | Subject | |
---|---|---|---|
Next Message | tanghy.fnst@fujitsu.com | 2021-08-04 01:21:32 | RE: [HACKERS] logical decoding of two-phase transactions |
Previous Message | Andres Freund | 2021-08-04 01:00:23 | Re: Use generation context to speed up tuplesorts |