pg_basebackup from REL_12_STABLE hands on solaris/sparch

From: Victor Wagner <vitus(at)wagner(dot)pp(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: pg_basebackup from REL_12_STABLE hands on solaris/sparch
Date: 2019-10-01 14:04:03
Message-ID: 20191001170403.7ab384cb@fafnir.local.vm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Collegues,

I've encountered following problem on some old Sparc64 machine running
solaris 10:

When I compile postgresql 12 with --enable-tap-tests and run make check
in src/bin, test src/bin/pg_basebackup/t/010_pg_basebackup.pl
hangs and hangs infinitely.

I've tried to attach gdb to the hanging process, but it attempt to
do backtrace in it, gdb reports that stack is corrupt

Attaching to program `/home/vitus/postgrespro/src/bin/pg_basebackup/pg_basebackup', process 1467
[New process 1467]
Retry #1:
Retry #2:
Retry #3:
Retry #4:
Reading symbols from /usr/lib/sparcv9/ld.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/sparcv9/ld.so.1
---Type <return> to continue, or q <return> to quit---
0x00000000ff2cca38 in ?? ()
(gdb) bt
#0 0x00000000ff2cca38 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

When afterword I kill hanged process with
kill -SEGV to get core, I get following stack trace from core file:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xffffffff7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
(gdb) bt
#0 0xffffffff7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
#1 0xffffffff7d5c8c8c in __sigtimedwait () from /lib/64/libc.so.1
#2 0xffffffff7d5c0628 in __posix_sigwait () from /lib/64/libc.so.1
#3 0xffffffff7f3362b4 in pq_reset_sigpipe (osigset=0xffffffff7fffeb1c,
sigpipe_pending=false, got_epipe=true) at fe-secure.c:529 #4
0xffffffff7f336084 in pqsecure_raw_write (conn=0x100135e90,
ptr=0x10013a370, len=5) at fe-secure.c:399 #5 0xffffffff7f335e28 in
pqsecure_write (conn=0x100135e90, ptr=0x10013a370, len=5) at
fe-secure.c:316 #6 0xffffffff7f326c54 in pqSendSome (conn=0x100135e90,
len=5) at fe-misc.c:876 #7 0xffffffff7f326e84 in pqFlush
(conn=0x100135e90) at fe-misc.c:1004 #8 0xffffffff7f316584 in
sendTerminateConn (conn=0x100135e90) at fe-connect.c:4031 #9
0xffffffff7f3165a4 in closePGconn (conn=0x100135e90) at
fe-connect.c:4049 #10 0xffffffff7f31663c in PQfinish (conn=0x100135e90)
at fe-connect.c:4083 #11 0x000000010000bc64 in BaseBackup () at
pg_basebackup.c:2136 #12 0x000000010000d7ec in main (argc=4,
argv=0xffffffff7ffff808) at pg_basebackup.c:2547

This happens on random tests in this test file with probablity about
1/10, but because there is more than 100 tests, hanging has 100%
probablity. But other two test files in src/bin/pg_basebackup directory
don't hang.

As far as I can notice, there is only two machines with Solaris in
pgbuildfarm now, and neither of them has any records of running
REL_12_STABLE branch. (not to mention that both don't run tap tests).

--

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-01 14:10:37 Re: Optimize partial TOAST decompression
Previous Message Tomas Vondra 2019-10-01 13:51:51 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions