Re: [PATCH] SVE popcount support

From: "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>
To: "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Rajat(dot)Ma(at)fujitsu(dot)com" <Rajat(dot)Ma(at)fujitsu(dot)com>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>
Subject: Re: [PATCH] SVE popcount support
Date: 2024-12-11 09:06:15
Message-ID: TY2PR01MB2667B42E4A0F4B1386BD9712973E2@TY2PR01MB2667.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the suggestion; we have removed the `xsave` flag.

We have used the following command for benchmarking:
time ./build_fj/bin/psql pop_db -c "select drive_popcount(10000000, 16);"

We ran it 20 times and took the average to flatten any CPU fluctuations. The results observed on `m7g.4xlarge`are in the attached Excel file.

We have also updated the condition for buffer alignment, skipping the alignment process if the buffer is already aligned. This seems to have improved the performance by a few milliseconds because the input buffer provided by `drive_popcount` is already aligned. PFA for the updated patch file.

Thanks,
Chiranmoy

________________________________
From: Malladi, Rama <ramamalladi(at)hotmail(dot)com>
Sent: Monday, December 9, 2024 10:06 PM
To: Susmitha, Devanga <Devanga(dot)Susmitha(at)fujitsu(dot)com>; Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>; pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>; Bhattacharya, Chiranmoy <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>; M A, Rajat <Rajat(dot)Ma(at)fujitsu(dot)com>; Hajela, Ragesh <Ragesh(dot)Hajela(at)fujitsu(dot)com>
Subject: Re: [PATCH] SVE popcount support

On 12/9/24 12:21 AM, Devanga(dot)Susmitha(at)fujitsu(dot)com<mailto:Devanga(dot)Susmitha(at)fujitsu(dot)com> wrote:
Hello,

We are sharing our patch for pg_popcount with SVE support as a contribution from our side in this thread. We hope this contribution will help in exploring and refining the popcount implementation further.
Our patch uses the existing infrastructure, i.e. the "choose_popcount_functions" method, to determine the correct popcount implementation based on the architecture, thereby requiring fewer code changes. The patch also includes implementations for popcount and popcount masked.
We can reference both solutions and work together toward achieving the most efficient and effective implementation for PostgreSQL.

Thanks for the patch and it looks good. I will review the full patch in the next couple of days. One observation was that the patch has `xsave` flags added. This isn't needed.

`pgport_cflags = {'crc': cflags_crc, 'popcnt': cflags_popcnt + cflags_popcnt_arm, 'xsave': cflags_xsave}`

Algorithm Overview:
1. For larger inputs, align the buffers to avoid double loads. For smaller inputs alignment is not necessary and might even degrade the performance.
2. Process the aligned buffer chunk by chunk till the last incomplete chunk.
3. Process the last incomplete chunk.
Our setup:
Machine: AWS EC2 c7g.8xlarge - 32vcpu, 64gb RAM
OS : Ubuntu 22.04.5 LTS
GCC: 11.4

Benchmark and Result:
We have used PostgreSQL community recommended popcount-test-module[0] for benchmarking and observed a speed-up of more than 4x for larger buffers. Even for smaller inputs of size 8 and 16 bytes there aren't any performance degradations observed.
Looking forward to your thoughts!

I tested the patch and here attached is the performance I see on a `c7g.xlarge`. The perf data doesn't quite match to what you observe (especially for 256B). In the chart, I have comparison of baseline, AWS SVE (what I had implemented) and Fujitsu SVE popcount implementations. Can you confirm the command-line you had used for the benchmark run?

I had used the below command-line:

`sudo su postgres -c "/usr/local/pgsql/bin/psql -c 'EXPLAIN ANALYZE SELECT drive_popcount(100000, 16);'"`

________________________________
From: Nathan Bossart <nathandbossart(at)gmail(dot)com><mailto:nathandbossart(at)gmail(dot)com>
Sent: Wednesday, December 4, 2024 21:37
To: Malladi, Rama <ramamalladi(at)hotmail(dot)com><mailto:ramamalladi(at)hotmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com><mailto:reshkekirill(at)gmail(dot)com>; pgsql-hackers <pgsql-hackers(at)postgresql(dot)org><mailto:pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] SVE popcount support

On Wed, Dec 04, 2024 at 08:51:39AM -0600, Malladi, Rama wrote:
> Thank you, Kirill, for the review and the feedback. Please find inline my
> reply and an updated patch.

Thanks for the updated patch. I have a couple of high-level comments.
Would you mind adding this to the commitfest system
(https://commitfest.postgresql.org/) so that it is picked up by our
automated patch testing tools?

> +# Check for ARMv8 SVE popcount intrinsics
> +#
> +CFLAGS_POPCNT=""
> +PG_POPCNT_OBJS=""
> +PGAC_SVE_POPCNT_INTRINSICS([])
> +if test x"$pgac_sve_popcnt_intrinsics" != x"yes"; then
> + PGAC_SVE_POPCNT_INTRINSICS([-march=armv8-a+sve])
> +fi
> +if test x"$pgac_sve_popcnt_intrinsics" = x"yes"; then
> + PG_POPCNT_OBJS="pg_popcount_sve.o"
> + AC_DEFINE(USE_SVE_POPCNT_WITH_RUNTIME_CHECK, 1, [Define to 1 to use SVE popcount instructions with a runtime check.])
> +fi
> +AC_SUBST(CFLAGS_POPCNT)
> +AC_SUBST(PG_POPCNT_OBJS)

We recently switched some intrinsics support in PostgreSQL to use
__attribute__((target("..."))) instead of applying special compiler flags
to specific files (e.g., commits f78667b and 4b03a27). The hope is that
this approach will be a little more sustainable as we add more
architecture-specific code. IMHO we should do something similar here.
While this means that older versions of clang might not pick up this
optimization (see the commit message for 4b03a27 for details), I think
that's okay because 1) this patch is intended for the next major version of
Postgres, which will take some time for significant adoption, and 2) this
is brand new code, so we aren't introducing any regressions for current
users.

> +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
> +extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (const char *buf, int bytes);

Could we combine this with the existing copy above this line? I'm thinking
of something like

#if defined(TRY_POPCNT_FAST) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK)
extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (...)
#endif

#ifdef TRY_POPCNT_FAST
...

> +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
> +extern uint64 pg_popcount_sve(const char *buf, int bytes);
> +extern int check_sve_support(void);
> +#endif

Are we able to use SVE instructions for pg_popcount32(), pg_popcount64(),
and pg_popcount_masked(), too?

> +static inline uint64
> +pg_popcount_choose(const char *buf, int bytes)
> +{
> + if (check_sve_support())
> + pg_popcount_optimized = pg_popcount_sve;
> + else
> + pg_popcount_optimized = pg_popcount_slow;
> + return pg_popcount_optimized(buf, bytes);
> +}
> +
> +#endif /* USE_SVE_POPCNT_WITH_RUNTIME_CHECK */

Can we put this code in the existing choose_popcount_functions() function
in pg_bitutils.c?

> +// check if sve supported
> +int check_sve_support(void)
> +{
> + // Read ID_AA64PFR0_EL1 register
> + uint64_t pfr0;
> + __asm__ __volatile__(
> + "mrs %0, ID_AA64PFR0_EL1"
> + : "=r" (pfr0));
> +
> + // SVE bits are 32-35
> + return (pfr0 >> 32) & 0xf;
> +}

Is this based on some reference code from a manual that we could cite here?
Or better yet, is it possible to do this without inline assembly (e.g.,
with another intrinsic function)?

> +/*
> + * pg_popcount_sve
> + * Returns the number of 1-bits in buf
> + */
> +uint64
> +pg_popcount_sve(const char *buf, int bytes)

I think this function could benefit from some additional comments to
explain what is happening at each step.

--
nathan

Attachment Content-Type Size
FJ - AWS Comparison.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 20.8 KB
v2-0001-SVE-support-for-popcount-and-popcount-masked.patch application/octet-stream 19.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-12-11 09:11:27 Re: Memory leak in pg_logical_slot_{get,peek}_changes
Previous Message Andrei Lepikhov 2024-12-11 09:02:32 Re: Removing unneeded self joins