From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Yuqi Gu <Yuqi(dot)Gu(at)arm(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize Arm64 crc32c implementation in Postgresql |
Date: | 2018-03-01 21:36:01 |
Message-ID: | 20180301213601.xhcsgwinf6fo6vq6@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2018-01-10 05:58:19 +0000, Yuqi Gu wrote:
> Currently PostgreSQL only implements hardware support for CRC32 checksums for the x86_64 architecture.
> Some ARMv8 (AArch64) CPUs implement the CRC32 extension which is implemented by inline assembly,
> so they can also benefit from hardware acceleration in IO-intensive workloads.
>
> I would like to propose the patch to optimize crc32c calculation with Arm64 specific instructions.
> The hardware-specific code implementation is used under #if defined USE_ARMCE_CRC32C_WITH_RUNTIME_CHECK.
> And the performance is improved on platforms: cortex-A57, cortex-A72, cortex-A73, etc.
>
> I'll create a CommitFests ticket for this submission.
> Any comments or feedback are welcome.
Could you show whether it actually improves performance? Usually bulk
loading data with parallel COPYs is a good way to hit this codepath.
> +
> +AC_DEFUN([PGAC_ARM64CE_CRC32_INTRINSICS],
> +[AC_CACHE_CHECK([for Arm64ce CRC32], [pgac_cv_arm64ce_crc32_intrinsics],
> +[AC_LINK_IFELSE([AC_LANG_PROGRAM([],
> + [unsigned int arm_flag = 0;
> +#if defined(__ARM_ARCH) && (__ARM_ARCH > 7)
> + arm_flag = 1;
> +#endif
> + return arm_flag == 1;])],
> + [pgac_cv_arm64ce_crc32_intrinsics="yes"],
> + [pgac_cv_arm64ce_crc32_intrinsics="no"])])
> +if test x"$pgac_cv_arm64ce_crc32_intrinsics" = x"yes"; then
> + pgac_arm64ce_crc32_intrinsics=yes
> +fi
> +])# PGAC_ARM64CE_CRC32_INTRINSICS
I don't understand what this check is supposed to be doing?
AC_LINK_IFELSE doesn't run the program, so I don't see this test
achieving anything at all?
> * Use slicing-by-8 algorithm.
> diff --git a/src/port/pg_crc32c_choose.c b/src/port/pg_crc32c_choose.c
> index 40bee67..d3682ad 100644
> --- a/src/port/pg_crc32c_choose.c
> +++ b/src/port/pg_crc32c_choose.c
> @@ -29,6 +29,20 @@
>
> #include "port/pg_crc32c.h"
>
> +#ifdef USE_ARMCE_CRC32C_WITH_RUNTIME_CHECK
> +#include <sys/auxv.h>
> +#include <asm/hwcap.h>
> +#ifndef HWCAP_CRC32
> +#define HWCAP_CRC32 (1 << 7)
> +#endif
> +static bool
> +pg_crc32c_arm64ce_available(void) {
> + unsigned long auxv = getauxval(AT_HWCAP);
> + return (auxv & HWCAP_CRC32) != 0;
> +}
> +
> +#else
What's the availability of these headers and functions on non-linux platforms?
> +#if defined(USE_ARMCE_CRC32C_WITH_RUNTIME_CHECK)
> +asm(".arch_extension crc");
So this emitted globally?
> +#define LDP(x,y,p) asm("ldp %x[a], %x[b], [%x[c]], #16" : [a]"=r"(x),[b]"=r"(y),[c]"+r"(p))
> +/* CRC32C: Castagnoli polynomial 0x1EDC6F41 */
> +#define CRC32CX(crc,value) asm("crc32cx %w[c], %w[c], %x[v]" : [c]"+r"(*&crc) : [v]"r"(+value))
> +#define CRC32CW(crc,value) asm("crc32cw %w[c], %w[c], %w[v]" : [c]"+r"(*&crc) : [v]"r"(+value))
> +#define CRC32CH(crc,value) asm("crc32ch %w[c], %w[c], %w[v]" : [c]"+r"(*&crc) : [v]"r"(+value))
> +#define CRC32CB(crc,value) asm("crc32cb %w[c], %w[c], %w[v]" : [c]"+r"(*&crc) : [v]"r"(+value))
> +
> +pg_crc32c
> +pg_comp_crc32c_arm64(pg_crc32c crc, const void* data, size_t len) {
> + uint64 p0, p1;
> + pg_crc32c crc32_c = crc;
> + long length = len;
> + const unsigned char *p_buf = data;
> +
> + /* Allow crc instructions in asm */
> + asm(".cpu generic+crc");
Hm, this switches it for the rest of the function, program, ...?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-03-01 21:36:23 | Re: Optimize Arm64 crc32c implementation in Postgresql |
Previous Message | Andres Freund | 2018-03-01 21:27:56 | Re: row filtering for logical replication |