From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | "Devulapalli, Raghuveer" <raghuveer(dot)devulapalli(at)intel(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shankaran, Akash" <akash(dot)shankaran(at)intel(dot)com> |
Subject: | Re: Improve CRC32C performance on SSE4.2 |
Date: | 2025-03-10 21:47:37 |
Message-ID: | Z89d-eRP9Tpd5Mq1@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote:
> On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> Overall, I wish we could avoid splitting things into separate files and
>> adding more header file gymnastics, but maybe there isn't much better we
>> can do without overhauling the CPU feature detection code.
>
> I wanted to make an attempt to make this aspect nicer. v13-0002
> incorporates deliberately compact and simple loops for inlined
> constant input into the dispatch function, and leaves the existing
> code alone. This avoids code churn and saves vertical space in the
> copied code. It needs a bit more commentary, but I hope this is a more
> digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
> be simpler if we can always assume non-constant input can go through a
> function pointer.
That is certainly more readable. FWIW I think it would be entirely
reasonable to replace the pg_crc32c_sse42.c implementation with a call to
this new pg_comp_crc32c_dispatch() function. Of course, you'd have to
split things up like:
pg_attribute_no_sanitize_alignment()
static inline
pg_crc32c
pg_comp_crc32c_sse42_inline(pgcrc32c crc, const void *data, size_t len)
{
const unsigned char *p = data;
#ifdef __x86_64__
for (; len >= 8; p += 8, len -= 8)
crc = _mm_crc32_u64(crc, *(const uint64 *) p);
#endif
for (; len >= 4; p += 4, len -= 4)
crc = _mm_crc32_u32(crc, *(const uint32 *) p);
for (; len > 0; len--)
crc = _mm_crc32_u8(crc, *p++)
return crc;
}
pg_attribute_no_sanitize_alignment()
static inline
pg_crc32c
pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
{
if (__builtin_constant_p(len))
return pg_comp_crc32c_sse42_inline(crc, data, len);
return pg_comp_crc32c_sse42(crc, data, len);
}
And then in pg_crc32c_sse42.c:
pg_attribute_no_sanitize_alignment()
pg_attribute_target("sse4.2")
pg_crc32c
pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len)
{
return pg_comp_crc32c_sse42_inline(crc, data, len);
}
Granted, that adds back some of the complexity you were trying to avoid
(and is very similar to your v12 patches), but it's pretty compact and
avoids some code duplication. FTR I don't feel too strongly about this,
but on balance I guess I'd be okay with a tad more complexity than your
v13 patches if it served a useful purpose.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-03-10 21:53:26 | Re: Statistics Import and Export: difference in statistics dumped |
Previous Message | David G. Johnston | 2025-03-10 21:19:22 | Documentation Edits for pg_createsubscriber |