Re: [PATCH] SVE popcount support

From: "Malladi, Rama" <ramamalladi(at)hotmail(dot)com>
To: "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, "Rajat(dot)Ma(at)fujitsu(dot)com" <Rajat(dot)Ma(at)fujitsu(dot)com>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>
Subject: Re: [PATCH] SVE popcount support
Date: 2024-12-09 16:36:58
Message-ID: DS0PR19MB72480630B0534380762E81D3B03C2@DS0PR19MB7248.namprd19.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 12/9/24 12:21 AM, Devanga(dot)Susmitha(at)fujitsu(dot)com wrote:
> Hello,
>
> We are sharing our patch for pg_popcount with SVE support as a
> contribution from our side in this thread. We hope this contribution
> will help in exploring and refining the popcount implementation further.
> Our patch uses the existing infrastructure, i.e. the
> "choose_popcount_functions" method, to determine the correct popcount
> implementation based on the architecture, thereby requiring fewer code
> changes. The patch also includes implementations for popcount and
> popcount masked.
> We can reference both solutions and work together toward achieving the
> most efficient and effective implementation for PostgreSQL.

Thanks for the patch and it looks good. I will review the full patch in
the next couple of days. One observation was that the patch has `xsave`
flags added. This isn't needed.

`pgport_cflags = {'crc': cflags_crc, 'popcnt': cflags_popcnt +
cflags_popcnt_arm, 'xsave': cflags_xsave}`

>
> *Algorithm Overview:*
> 1. For larger inputs, align the buffers to avoid double loads. For
> smaller inputs alignment is not necessary and might even degrade the
> performance.
> 2. Process the aligned buffer chunk by chunk till the last incomplete
> chunk.
> 3. Process the last incomplete chunk.
> *Our setup:*
> Machine: AWS EC2 c7g.8xlarge - 32vcpu, 64gb RAM
> OS : Ubuntu 22.04.5 LTS
> GCC: 11.4
> *Benchmark and Result:*
> We have used PostgreSQL community recommended
> popcount-test-module[0] for benchmarking and observed a speed-up of
> more than 4x for larger buffers. Even for smaller inputs of size 8 and
> 16 bytes there aren't any performance degradations observed.
> Looking forward to your thoughts!
>
I tested the patch and here attached is the performance I see on a
`c7g.xlarge`. The perf data doesn't quite match to what you observe
(especially for 256B). In the chart, I have comparison of baseline, AWS
SVE (what I had implemented) and Fujitsu SVE popcount implementations.
Can you confirm the command-line you had used for the benchmark run?

I had used the below command-line:

`sudo su postgres -c "/usr/local/pgsql/bin/psql -c 'EXPLAIN ANALYZE
SELECT drive_popcount(100000, 16);'"`

>
>
> ------------------------------------------------------------------------
> *From:* Nathan Bossart <nathandbossart(at)gmail(dot)com>
> *Sent:* Wednesday, December 4, 2024 21:37
> *To:* Malladi, Rama <ramamalladi(at)hotmail(dot)com>
> *Cc:* Kirill Reshke <reshkekirill(at)gmail(dot)com>; pgsql-hackers
> <pgsql-hackers(at)postgresql(dot)org>
> *Subject:* Re: [PATCH] SVE popcount support
> On Wed, Dec 04, 2024 at 08:51:39AM -0600, Malladi, Rama wrote:
> > Thank you, Kirill, for the review and the feedback. Please find
> inline my
> > reply and an updated patch.
>
> Thanks for the updated patch.  I have a couple of high-level comments.
> Would you mind adding this to the commitfest system
> (https://commitfest.postgresql.org/
> <https://commitfest.postgresql.org/>) so that it is picked up by our
> automated patch testing tools?
>
> > +# Check for ARMv8 SVE popcount intrinsics
> > +#
> > +CFLAGS_POPCNT=""
> > +PG_POPCNT_OBJS=""
> > +PGAC_SVE_POPCNT_INTRINSICS([])
> > +if test x"$pgac_sve_popcnt_intrinsics" != x"yes"; then
> > +  PGAC_SVE_POPCNT_INTRINSICS([-march=armv8-a+sve])
> > +fi
> > +if test x"$pgac_sve_popcnt_intrinsics" = x"yes"; then
> > +  PG_POPCNT_OBJS="pg_popcount_sve.o"
> > +  AC_DEFINE(USE_SVE_POPCNT_WITH_RUNTIME_CHECK, 1, [Define to 1 to
> use SVE popcount instructions with a runtime check.])
> > +fi
> > +AC_SUBST(CFLAGS_POPCNT)
> > +AC_SUBST(PG_POPCNT_OBJS)
>
> We recently switched some intrinsics support in PostgreSQL to use
> __attribute__((target("..."))) instead of applying special compiler flags
> to specific files (e.g., commits f78667b and 4b03a27).  The hope is that
> this approach will be a little more sustainable as we add more
> architecture-specific code.  IMHO we should do something similar here.
> While this means that older versions of clang might not pick up this
> optimization (see the commit message for 4b03a27 for details), I think
> that's okay because 1) this patch is intended for the next major
> version of
> Postgres, which will take some time for significant adoption, and 2) this
> is brand new code, so we aren't introducing any regressions for current
> users.
>
> > +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
> > +extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (const char
> *buf, int bytes);
>
> Could we combine this with the existing copy above this line? I'm thinking
> of something like
>
>         #if defined(TRY_POPCNT_FAST) ||
> defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK)
>         extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (...)
>     #endif
>
>     #ifdef TRY_POPCNT_FAST
>     ...
>
> > +#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
> > +extern uint64 pg_popcount_sve(const char *buf, int bytes);
> > +extern int check_sve_support(void);
> > +#endif
>
> Are we able to use SVE instructions for pg_popcount32(), pg_popcount64(),
> and pg_popcount_masked(), too?
>
> > +static inline uint64
> > +pg_popcount_choose(const char *buf, int bytes)
> > +{
> > +     if (check_sve_support())
> > +             pg_popcount_optimized = pg_popcount_sve;
> > +     else
> > +             pg_popcount_optimized = pg_popcount_slow;
> > +     return pg_popcount_optimized(buf, bytes);
> > +}
> > +
> > +#endif        /* USE_SVE_POPCNT_WITH_RUNTIME_CHECK */
>
> Can we put this code in the existing choose_popcount_functions() function
> in pg_bitutils.c?
>
> > +// check if sve supported
> > +int check_sve_support(void)
> > +{
> > +     // Read ID_AA64PFR0_EL1 register
> > +     uint64_t pfr0;
> > +     __asm__ __volatile__(
> > +     "mrs %0, ID_AA64PFR0_EL1"
> > +     : "=r" (pfr0));
> > +
> > +     // SVE bits are 32-35
> > +     return (pfr0 >> 32) & 0xf;
> > +}
>
> Is this based on some reference code from a manual that we could cite
> here?
> Or better yet, is it possible to do this without inline assembly (e.g.,
> with another intrinsic function)?
>
> > +/*
> > + * pg_popcount_sve
> > + *              Returns the number of 1-bits in buf
> > + */
> > +uint64
> > +pg_popcount_sve(const char *buf, int bytes)
>
> I think this function could benefit from some additional comments to
> explain what is happening at each step.
>
> --
> nathan
>
>

Attachment Content-Type Size
image/png 25.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-09 16:37:11 Re: IWYU annotations
Previous Message Andres Freund 2024-12-09 16:31:26 Re: FileFallocate misbehaving on XFS