Re: Improve CRC32C performance on SSE4.2

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com>
Subject: Re: Improve CRC32C performance on SSE4.2
Date: 2025-02-19 05:19:05
Message-ID: CANWCAZZSB12uW8r7GCqnZRdEcO55OYO4smKJvkvWdgEyeMVeBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 19, 2025 at 1:28 AM Devulapalli, Raghuveer
<raghuveer(dot)devulapalli(at)intel(dot)com> wrote:
> The runtime detection code could also be appended with function __attribute__((constructor)) so that it gets executed before main.

Hmm! I didn't know about that, thanks! It works on old gcc/clang, so
that's good. I can't verify online if Arm etc. executes properly since
the execute button is greyed out, but I don't see why it wouldn't:

https://godbolt.org/z/3as9MGr3G

Then we could have:

#ifdef FRONTEND
pg_attribute_constructor()
#endif
void
pg_cpucap_initialize(void)
{
...
}

Now, there is no equivalent on MSVC and workarounds are fragile [1].
Maybe we could only assert initialization happened for the backend and
for frontend either
- add a couple manual initializations to to the frontend programs
where we don't want to lose performance for non-gcc/clang.
- require CRC on x86-64 MSVC since Windows 10 is EOL soon, going by
Thomas M.'s earlier findings on popcount (also SSE4.2) [2]

The first is less risky but less tidy.

> > I think the runtime dispatch is fairly simple for this case.
> + pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len)
> + {
> + ....
> + #ifdef HAVE_PCLMUL_RUNTIME
> + if (len >= PCLMUL_THRESHOLD && (pg_cpucap & PGCPUCAP_CLMUL))
>
> IMO, the pclmul and sse4.2 versions should be dispatched independently and the dispatch logic should be moved out of the pg_crc32c.h to it own pg_crc32c_dispatch.c file.

That makes sense, but it does close the door on future inlining.

> Also, thank you for taking lead on developing this patch. Since the latest patch seems to be in a good shape, I'm happy to provide feedback and review your work, or can continue development work based on any feedback. Please let me know which you'd prefer.

Sure. Depending on any feedback on the above constructor technique,
I'll work on the following, since I've prototyped most of the first
and the second is mostly copy-and-paste from your earlier work:

> > pg_cpucap_crc32c
>
> I like the idea of moving all CPU runtime detection to a single module outside of implementations. This makes it easy to extend in the future and keeps the functionalities separate.
>
> > - Rename the CRC choose*.c files to pg_cpucap_{x86,arm}.c and build them
> > unconditionally for each platform
>
> +1. Sounds perfect. We should also move the avx512 runtime detection of popcount here.

[1] https://stackoverflow.com/questions/1113409/attribute-constructor-equivalent-in-vc
[2] https://www.postgresql.org/message-id/CA+hUKGKS64zJezV9y9mPcB-J0i+fLGiv3FAdwSH_3SCaVdrjyQ@mail.gmail.com

--
John Naylor
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2025-02-19 06:13:07 Re: Forbid to DROP temp tables of other sessions
Previous Message Shubham Khanna 2025-02-19 04:35:39 Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.