Re: hung postmaster?

From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: hung postmaster?
Date: 2005-02-19 00:21:44
Message-ID: 200502181721.44404.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

OK, it appears I can reproduce this bug in fairly short
order. Below are gdb backtraces along with current
snapshots from ps, netstat, and a snippet of the server
log. This is no longer an urgent issue for me with the
gcc 3.4.2 workaround available, but I do have a stalled
test cluster postmaster right now, so I can leave it up
for a while if anyone cares for more information.

The identical source built in the identical fashion and
running on the same hardware, but using gcc 3.4.2
instead of gcc 3.3.2, continues to work fine and does
not exhibit this problem so far. Again, this is 64-bit
PostgreSQL 7.4.6 on HP-UX B.11.23 on ia64 box.

Details of current hang...

PIDS 29080 and 26752 in the listing below are hung,
apparently because the postmaster is hung (PID 28775).
PID 26752 is a remote psql client that wanted to just
connect, select version(), and disconnect. I had
that going in a loop, and this PID was the first to
hang; PID 29080 is a local psql client.

$ps -u pg -lf
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME COMD
1401 S pg 28777 28775 0 154 20 e000000170fcf4c0 1162 e000000164f7e0c0 13:41:07 pts/3 0:00 postgres: stats buffer process
1401 S pg 28775 1 0 154 20 e00000016a5b5940 1118 e0000001744340e8 13:41:07 pts/3 0:00 /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/postm
1401 S pg 26752 28629 0 154 20 e000000191e03280 101 e000000164f7e100 18:09:01 pts/3 0:00 psql -l
401 R pg 7130 26918 1 178 20 e00000016adac4c0 68 - 18:29:42 pts/11 0:00 ps -u pg -lf
1401 S pg 28778 28777 0 154 20 e00000016c591700 1130 e000000164f7e100 13:41:07 pts/3 0:00 postgres: stats collector process
401 S pg 28629 4112 0 158 20 e00000016aecc940 333 e000000170f9c000 13:40:53 pts/3 0:00 -sh
401 S pg 26918 26887 0 158 20 e00000016c2a0280 351 e000000171506000 18:09:24 pts/11 0:00 -sh
1421 T pg 29080 26918 0 154 20 e00000016a90f940 101 - 18:13:25 pts/11 0:00 psql -c select version()

$which postmaster
/opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/postmaster

$gdb `which postmaster`
HP gdb 5.0 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) attach 28775
Attaching to program: /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/postmaster, process 28775
Reading symbols from /usr/lib/hpux64/libxnet.so.1...done.
Reading symbols from /usr/lib/hpux64/libc.so.1...done.
Reading symbols from /usr/lib/hpux64/libgen.so.1...done.
Reading symbols from /usr/lib/hpux64/libdl.so.1...done.
Reading symbols from /usr/lib/hpux64/libnsl.so.1...done.
Reading symbols from /usr/lib/hpux64/libm.so.1...done.
Reading symbols from /usr/lib/hpux64/libxti.so.1...done.
Reading symbols from /usr/lib/hpux64/libnss_files.so.1...done.
0xc000000000304230:0 in _accept_sys+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) bt
#0 0xc000000000304230:0 in _accept_sys+0x30 () from /usr/lib/hpux64/libc.so.1
#1 0xc0000000003100b0:0 in accept+0x150 () from /usr/lib/hpux64/libc.so.1
#2 0xc000000001aac450:0 in accept+0x70 () from /usr/lib/hpux64/libxnet.so.1
#3 0x4000000000275df0:0 in StreamConnection+0x40 ()
#4 0x40000000002e7da0:0 in ConnCreate+0x80 ()
#5 0x40000000002e6530:0 in ServerLoop+0x3b0 ()
#6 0x40000000002e5740:0 in PostmasterMain+0x1300 ()
#7 0x4000000000279800:0 in main+0x520 ()
(gdb) p debug_query_string
$1 = 0
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) Detaching from program: /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/postmaster, process 28775

$which postmaster
/opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/psql

$gdb `which psql`
HP gdb 5.0 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) attach 26752
Attaching to program: /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/psql, process 26752
Reading symbols from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3...done.
Reading symbols from /usr/lib/hpux64/libxnet.so.1...done.
Reading symbols from /usr/lib/hpux64/libc.so.1...done.
Reading symbols from /usr/lib/hpux64/libgen.so.1...done.
Reading symbols from /usr/lib/hpux64/libdl.so.1...done.
Reading symbols from /usr/lib/hpux64/libnsl.so.1...done.
Reading symbols from /usr/lib/hpux64/libm.so.1...done.
Reading symbols from /usr/lib/hpux64/libxti.so.1...done.
0xc000000000301e70:0 in _poll_sys+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) bt
#0 0xc000000000301e70:0 in _poll_sys+0x30 () from /usr/lib/hpux64/libc.so.1
#1 0xc000000000313110:0 in poll+0x150 () from /usr/lib/hpux64/libc.so.1
#2 0xc00000002d36ee70:0 in pqSocketPoll+0x110 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#3 0xc00000002d36ec40:0 in pqSocketCheck+0x80 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#4 0xc00000002d36eac0:0 in pqWaitTimed+0x40 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#5 0xc00000002d362f60:0 in connectDBComplete+0xe0 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#6 0xc00000002d362500:0 in PQsetdbLogin+0x410 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#7 0x40000000000218f0:0 in main+0x510 ()
(gdb) p debug_query_string
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) Detaching from program: /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/psql, process 26752

$gdb `which psql`
HP gdb 5.0 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) attach 29080
Attaching to program: /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/psql, process 29080
Reading symbols from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3...done.
Reading symbols from /usr/lib/hpux64/libxnet.so.1...done.
Reading symbols from /usr/lib/hpux64/libc.so.1...done.
Reading symbols from /usr/lib/hpux64/libgen.so.1...done.
Reading symbols from /usr/lib/hpux64/libdl.so.1...done.
Reading symbols from /usr/lib/hpux64/libnsl.so.1...done.
Reading symbols from /usr/lib/hpux64/libm.so.1...done.
Reading symbols from /usr/lib/hpux64/libxti.so.1...done.
0xc000000000301e70:0 in _poll_sys+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) bt
#0 0xc000000000301e70:0 in _poll_sys+0x30 () from /usr/lib/hpux64/libc.so.1
#1 0xc000000000313110:0 in poll+0x150 () from /usr/lib/hpux64/libc.so.1
#2 0xc00000002d36ee70:0 in pqSocketPoll+0x110 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#3 0xc00000002d36ec40:0 in pqSocketCheck+0x80 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#4 0xc00000002d36eac0:0 in pqWaitTimed+0x40 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#5 0xc00000002d362f60:0 in connectDBComplete+0xe0 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#6 0xc00000002d362500:0 in PQsetdbLogin+0x410 ()
from /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/lib/libpq.so.3
#7 0x40000000000218f0:0 in main+0x510 ()
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) Detaching from program: /opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/psql, process 29080

$uname -a
HP-UX ... B.11.23 ... ia64 ...

$file `which postmaster`
/opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/postmaster: ELF-64 executable object file - IA64

$file `which psql`
/opt/pgsql/installs/postgresql-7.4.6-gcc3.3.2-B.11.23/bin/psql: ELF-64 executable object file - IA64

$psql -V
psql (PostgreSQL) 7.4.6

$postmaster -V
postmaster (PostgreSQL) 7.4.6

This is the tail end of the server log, showing nothing
was ever logged for the hung connections...

2005-02-18 14:25:45.558 [20313] LOG: connection received: host=10.0.1.80 port=45976
2005-02-18 14:25:45.921 [20313] LOG: connection authorized: user=pg database=pg
2005-02-18 14:25:46.394 [20313] LOG: statement: begin; select getdatabaseencoding(); commit
2005-02-18 14:25:46.395 [20313] LOG: duration: 0.862 ms
2005-02-18 14:25:46.807 [20313] LOG: statement: select version()
2005-02-18 14:25:46.808 [20313] LOG: duration: 0.646 ms
2005-02-18 14:26:47.092 [20818] LOG: connection received: host=10.0.1.80 port=45993
2005-02-18 14:26:47.278 [20818] LOG: connection authorized: user=pg database=pg
2005-02-18 14:26:47.461 [20818] LOG: statement: begin; select getdatabaseencoding(); commit
2005-02-18 14:26:47.462 [20818] LOG: duration: 0.792 ms
2005-02-18 14:26:47.696 [20818] LOG: statement: select version()
2005-02-18 14:26:47.696 [20818] LOG: duration: 0.557 ms
2005-02-18 14:27:47.993 [21220] LOG: connection received: host=10.0.1.80 port=46015
2005-02-18 14:27:48.192 [21220] LOG: connection authorized: user=pg database=pg
2005-02-18 14:27:48.384 [21220] LOG: statement: begin; select getdatabaseencoding(); commit
2005-02-18 14:27:48.385 [21220] LOG: duration: 0.961 ms
2005-02-18 14:27:48.560 [21220] LOG: statement: select version()
2005-02-18 14:27:48.560 [21220] LOG: duration: 0.545 ms
2005-02-18 14:28:48.826 [21702] LOG: connection received: host=10.0.1.80 port=46035
2005-02-18 14:28:49.087 [21702] LOG: connection authorized: user=pg database=pg
2005-02-18 14:28:49.318 [21702] LOG: statement: begin; select getdatabaseencoding(); commit
2005-02-18 14:28:49.319 [21702] LOG: duration: 0.809 ms
2005-02-18 14:28:49.516 [21702] LOG: statement: select version()
2005-02-18 14:28:49.516 [21702] LOG: duration: 0.360 ms
2005-02-18 14:29:49.717 [22047] LOG: connection received: host=10.0.1.80 port=46060
2005-02-18 14:29:49.910 [22047] LOG: connection authorized: user=pg database=pg
2005-02-18 14:29:50.138 [22047] LOG: statement: begin; select getdatabaseencoding(); commit
2005-02-18 14:29:50.139 [22047] LOG: duration: 0.831 ms
2005-02-18 14:29:50.320 [22047] LOG: statement: select version()
2005-02-18 14:29:50.321 [22047] LOG: duration: 0.539 ms
2005-02-18 14:30:50.527 [22359] LOG: connection received: host=10.0.1.80 port=46087
2005-02-18 14:30:50.710 [22359] LOG: connection authorized: user=pg database=pg
2005-02-18 14:30:50.927 [22359] LOG: statement: begin; select getdatabaseencoding(); commit
2005-02-18 14:30:50.928 [22359] LOG: duration: 0.855 ms
2005-02-18 14:30:51.105 [22359] LOG: statement: select version()
2005-02-18 14:30:51.106 [22359] LOG: duration: 0.641 ms

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ed L. 2005-02-19 00:25:05 Re: hung postmaster?
Previous Message Dieter Schröder 2005-02-18 23:29:14 Re: PostgreSQL Replication