Re: AIX support

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Srirama Kucherlapati <sriram(dot)rk(at)in(dot)ibm(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Noah Misch <noah(at)leadboat(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "tvk1271(at)gmail(dot)com" <tvk1271(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tristan Partin <tristan(at)neon(dot)tech>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, "postgres-ibm-aix(at)wwpdl(dot)vnet(dot)ibm(dot)com" <postgres-ibm-aix(at)wwpdl(dot)vnet(dot)ibm(dot)com>
Subject: Re: AIX support
Date: 2025-04-07 10:04:39
Message-ID: 19422eb3-54dc-4afb-8046-5eee906edacd@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/04/2025 21:29, Srirama Kucherlapati wrote:
>>> - WRT to the MEMSET_LOOP_LIMIT flag, this is set to “0”, which would
>>> internally use
>
>> Yes, I understand what it does. But why? Whatever benchmarking was done
>> back in 2006 by is no longer relevant.
>
> We ran the program , mentioned in the below link and collected the
> benchmark stats on our node (POWER_10).
>
> https://postgrespro.com/list/thread-id/1673194 <https://postgrespro.com/
> list/thread-id/1673194>
>
> The native AIX memset() seems to performs better. The benchmark seems to
> be still relevant, so I think we should continue to use the existing optimization for
> AIX.

At least it needs to be updated to match what MemSet() looks like
nowadays. The changes may be just cosmetic, but better check. Should
also check the effect on MemSetAligned(). That might matter more for
performance in practice.

A third thing to check is the performance of MemSet() when the pointer
is, in fact, aligned.

The other question is what do the results look like on other platforms?
How much difference does the libc implementation make, vs. the compiler
and CPU architecture? If the difference is related to compiler or CPU
architecture, then this doesn't belong in the AIX template, but
somewhere else.

> Below are the stats (64bit Object mode).
>
>>> ./memset-aix
>
>         sizeof(int)  = 4
>         sizeof(long) = 8

MemSet() uses 'long', so the int tests are not relevant. I have omitted
them below.

>         memset by int (size=8) : 0.280301
>         Loop by long (size=8) : 0.202650
>
>         memset by int (size=16) : 0.280979
>         Loop by long (size=16) : 0.246879
>
>         memset by int (size=32) : 0.331691
>         Loop by long (size=32) : 0.422261

Ok, MemSet() is faster with very small sizes, the crossover is somewhere
between 16 and 32 bytes.

I'm actually surprised the compiler doesn't replace the memset() call
with a few store instructions with these sizes.

>         memset by int (size=1024) : 0.904048
>         Loop by long (size=1024) : 24.149871

So with larger sizes, memset() wins hands down.

I'm surprised how big the difference is, because I actually expected the
compiler to detect the memory-zeroing loop and replace it with some
fancy vector instructions (does powerpc have any?). Or a call to
memset(); I've seen compilers convert loops to memset() and vice versa.

My gut feeling is actually that we should remove the MemSet() macro
altogether and just use memset() everywhere. The compilers are much
better at optimizing it in year 2025 than they were back in 2002. I'd
love to see some rigorous benchmarks across different platforms and
compilers to demonstrate that, and then just get rid of MemSet().

MemSetAligned() might still be worth keeping. Sometimes we know that a
piece of memory is aligned, but the compiler does not. But maybe even
that should just assert and hint the compiler that the input is aligned,
and then call memset().

If you'd like to help the community in general, if you could do some
more rigorous benchmarking along those lines, not just for AIX, and
start a new thread to discuss that, that'd be much appreciated. That
would be the best way to resolve this.

For the more narrow question of what should the AIX template do, that
comes down to whether there's some *AIX-specific* performance
difference. The generated powerpc assembly code is presumably the same
on AIX and other operating systems, so it comes down to whether there's
some big difference in AIX's memset() implementation vs. glibc's.

>> diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
>
>> Why is this change needed?
>
>> Yes, I know we've been over this many times already. I still don't
>> understand why it's needed. The onus is on you to explain it adequately,
>> in comments in the patch, so that I and others understand it. Or even
>> better, remove it if it's not necessary.
>
> If you recall, we previously considered replacing this assembly code
> with __sync_lock_test_and_set(). However, as you mentioned earlier,
> this should be handled in a separate patch. For now, I'll make a
> note and submit a separate patch for this later, as originally
> planned. Below is the reference to older discussion.

Yes, I do recall. Please read again my comment above: this all needs to
be explained in comments in the code.

To be precise, I have these questions:

- Does GCC on AIX (still) use the IBM assembler?
- Does the IBM assembler still not understand the label syntax?
- Is there some other label syntax that would work on the IBM assembler?
- Is it possible to use the GNU assembler instead?

>>> +# -blibpath must contain ALL directories where we should look for libraries
>>> +libpath := $(shell echo $(subst -L,:,$(filter -L/%,$(LDFLAGS))) | sed -e's/ //g'):/usr/lib:/lib
>
>> Is this still sensible on modern AIX systems? What happens if you leave
>> it out?
>
> This is required as it is looking for the possible non-default
> directories for the linker at the runtime. This is used along with
> rpath. As suggested, I tested this by removing the libpath, but at
> run time the linker is not able to find the dependent libraries path
> as a result, the binaries are not getting loaded. After doing some
> research, AIX uses a stricter, more*explicit* approach. The runtime
> linker expects to tell it exactly where to look using -blibpath.

Ok, some comments would be in order to explain that, maybe with links to
the relevant AIX documentation.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-04-07 10:05:07 Re: Restrict publishing of partitioned table with a foreign table as partition
Previous Message Zhijie Hou (Fujitsu) 2025-04-07 09:58:39 RE: BUG #18815: Logical replication worker Segmentation fault