Re: Server crash on RHEL 9/s390x platform against PG16

From: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server crash on RHEL 9/s390x platform against PG16
Date: 2023-10-23 04:06:36
Message-ID: CAF1DzPU_QUXO4S_jAcRJs+1O1GzNVKDe6KWHiv2Bz7HSHiz-vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 21, 2023 at 5:17 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2023-09-12 15:27:21 +0530, Suraj Kharage wrote:
> > *[edb(at)9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release
> 9.2
> > (Turquoise Kodkod)[edb(at)9428da9d2137 postgres]$ lscpuArchitecture:
> > s390x CPU op-mode(s): 32-bit, 64-bit Address sizes: 39
> bits
>
> Can you provide the rest of the lscpu output? There have been issues with
> Z14
> vs Z15:
> https://github.com/llvm/llvm-project/issues/53009
>
> You're apparently not hitting that, but given that fact, you either are on
> a
> slightly older CPU, or you have applied a patch to work around it. Because
> otherwise your uild instructions below would hit that problem, I think.
>
>
> > physical, 48 bits virtual Byte Order: Big Endian*
> > *Configure command:*
> > ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd
> --with-llvm
> > --with-perl --with-python --with-tcl --with-openssl --enable-nls
> > --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu
> > --enable-debug --enable-cassert --with-pgport=5414
>
> Hm, based on "--with-libcurl" this isn't upstream postgres, correct? Have
> you
> verified the issue reproduces on upstream postgres?
>

Yes, I can reproduce this on upstream postgres master and v16 branch.

Here are details:

./configure --prefix=/home/edb/postgres/ --with-zstd --with-llvm
--with-perl --with-python --with-tcl --with-openssl --enable-nls
--with-libxml --with-libxslt --with-systemd --without-icu --enable-debug
--enable-cassert --with-pgport=5414 CFLAGS="-g -O0"

[edb(at)9428da9d2137 postgres]$ cat /etc/redhat-release

AlmaLinux release 9.2 (Turquoise Kodkod)

[edb(at)9428da9d2137 edbas]$ lscpu

Architecture: s390x

CPU op-mode(s): 32-bit, 64-bit

Address sizes: 39 bits physical, 48 bits virtual

Byte Order: Big Endian

CPU(s): 9

On-line CPU(s) list: 0-8

Vendor ID: GenuineIntel

Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

CPU family: 6

Model: 158

Thread(s) per core: 1

Core(s) per socket: 1

Socket(s): 9

Stepping: 10

BogoMIPS: 5200.00

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx
pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni
pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx

16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave
avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 avx2
bmi2 erms xsaveopt arat

Caches (sum of all):

L1d: 288 KiB (9 instances)

L1i: 288 KiB (9 instances)

L2: 2.3 MiB (9 instances)

L3: 108 MiB (9 instances)

Vulnerabilities:

Itlb multihit: KVM: Mitigation: VMX unsupported

L1tf: Mitigation; PTE Inversion

Mds: Vulnerable; SMT Host state unknown

Meltdown: Vulnerable

Mmio stale data: Vulnerable

Spec store bypass: Vulnerable

Spectre v1: Vulnerable: __user pointer sanitization and
usercopy barriers only; no swapgs barriers

Spectre v2: Vulnerable, STIBP: disabled

Srbds: Unknown: Dependent on hypervisor status

Tsx async abort: Not affected

[edb(at)9428da9d2137 postgres]$ clang --version

clang version 15.0.7 (Red Hat 15.0.7-2.el9)

Target: s390x-ibm-linux-gnu

Thread model: posix

InstalledDir: /usr/bin

[edb(at)9428da9d2137 postgres]$ rpm -qa | grep llvm

*llvm*-libs-15.0.7-1.el9.s390x

*llvm*-15.0.7-1.el9.s390x

*llvm*-test-15.0.7-1.el9.s390x

*llvm*-static-15.0.7-1.el9.s390x

*llvm*-devel-15.0.7-1.el9.s390x

Please let me know if any further information is required.

> >
> > *Test case:*
> > CREATE TABLE rm32044_t1
> > (
> > pkey integer,
> > val text
> > );
> > CREATE TABLE rm32044_t2
> > (
> > pkey integer,
> > label text,
> > hidden boolean
> > );
> > CREATE TABLE rm32044_t3
> > (
> > pkey integer,
> > val integer
> > );
> > CREATE TABLE rm32044_t4
> > (
> > pkey integer
> > );
> > insert into rm32044_t1 values ( 1 , 'row1');
> > insert into rm32044_t1 values ( 2 , 'row2');
> > insert into rm32044_t2 values ( 1 , 'hidden', true);
> > insert into rm32044_t2 values ( 2 , 'visible', false);
> > insert into rm32044_t3 values (1 , 1);
> > insert into rm32044_t3 values (2 , 1);
> >
> > postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey
> > = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
> > rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
> > The connection to the server was lost. Attempting reset: Failed.
>
> I tried this on both master and 16, without hitting this issue.
>
> If you can reproduce the issue on upstream postgres, can you share more
> about
> your configuration?
>
> Greetings,
>
> Andres Freund
>

--
--

Thanks & Regards,
Suraj kharage,

edbpostgres.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-10-23 05:34:55 RE: pg_upgrade's interaction with pg_resetwal seems confusing
Previous Message David G. Johnston 2023-10-23 03:57:48 Re: Fix output of zero privileges in psql