From: | Victor Wagner <vitus(at)wagner(dot)pp(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | pg_basebackup from REL_12_STABLE hands on solaris/sparch |
Date: | 2019-10-01 14:04:03 |
Message-ID: | 20191001170403.7ab384cb@fafnir.local.vm |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Collegues,
I've encountered following problem on some old Sparc64 machine running
solaris 10:
When I compile postgresql 12 with --enable-tap-tests and run make check
in src/bin, test src/bin/pg_basebackup/t/010_pg_basebackup.pl
hangs and hangs infinitely.
I've tried to attach gdb to the hanging process, but it attempt to
do backtrace in it, gdb reports that stack is corrupt
Attaching to program `/home/vitus/postgrespro/src/bin/pg_basebackup/pg_basebackup', process 1467
[New process 1467]
Retry #1:
Retry #2:
Retry #3:
Retry #4:
Reading symbols from /usr/lib/sparcv9/ld.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/sparcv9/ld.so.1
---Type <return> to continue, or q <return> to quit---
0x00000000ff2cca38 in ?? ()
(gdb) bt
#0 0x00000000ff2cca38 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
When afterword I kill hanged process with
kill -SEGV to get core, I get following stack trace from core file:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xffffffff7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
(gdb) bt
#0 0xffffffff7d5d94ac in ___sigtimedwait () from /lib/64/libc.so.1
#1 0xffffffff7d5c8c8c in __sigtimedwait () from /lib/64/libc.so.1
#2 0xffffffff7d5c0628 in __posix_sigwait () from /lib/64/libc.so.1
#3 0xffffffff7f3362b4 in pq_reset_sigpipe (osigset=0xffffffff7fffeb1c,
sigpipe_pending=false, got_epipe=true) at fe-secure.c:529 #4
0xffffffff7f336084 in pqsecure_raw_write (conn=0x100135e90,
ptr=0x10013a370, len=5) at fe-secure.c:399 #5 0xffffffff7f335e28 in
pqsecure_write (conn=0x100135e90, ptr=0x10013a370, len=5) at
fe-secure.c:316 #6 0xffffffff7f326c54 in pqSendSome (conn=0x100135e90,
len=5) at fe-misc.c:876 #7 0xffffffff7f326e84 in pqFlush
(conn=0x100135e90) at fe-misc.c:1004 #8 0xffffffff7f316584 in
sendTerminateConn (conn=0x100135e90) at fe-connect.c:4031 #9
0xffffffff7f3165a4 in closePGconn (conn=0x100135e90) at
fe-connect.c:4049 #10 0xffffffff7f31663c in PQfinish (conn=0x100135e90)
at fe-connect.c:4083 #11 0x000000010000bc64 in BaseBackup () at
pg_basebackup.c:2136 #12 0x000000010000d7ec in main (argc=4,
argv=0xffffffff7ffff808) at pg_basebackup.c:2547
This happens on random tests in this test file with probablity about
1/10, but because there is more than 100 tests, hanging has 100%
probablity. But other two test files in src/bin/pg_basebackup directory
don't hang.
As far as I can notice, there is only two machines with Solaris in
pgbuildfarm now, and neither of them has any records of running
REL_12_STABLE branch. (not to mention that both don't run tap tests).
--
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-10-01 14:10:37 | Re: Optimize partial TOAST decompression |
Previous Message | Tomas Vondra | 2019-10-01 13:51:51 | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |