From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: weird buildfarm failures on arm/mipsel and --with-tcl |
Date: | 2007-01-24 18:35:44 |
Message-ID: | 45B7A700.3000802@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>>> one of my new buildfarm boxes (an Debian/Etch based ARM box) is
>>> sometimes failing to stop the database during the regression tests:
>>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quagga&dt=2007-01-08%2003:03:03
>>> this only seems to happen sometimes and only if --with-tcl is enabled on
>>> quagga.
>>> lionfish (my mipsel box) is able to trigger that on every build if I
>>> enable --with-tcl but it is nearly impossible to debug it there because
>>> of the low amount of memory and diskspace it has.
>> Hm, could pl/tcl somehow be preventing the backend from exiting once
>> it's run any pl/tcl stuff? I have no idea why though, and even less
>> why it wouldn't be repeatable.
>>
>>> After the stopdb failure we still have those processes running:
>>> pgbuild 3488 0.0 2.4 43640 6300 ? Ss 06:15 0:01
>>> postgres: pgbuild pl_regression [local] idle
>> Can you get a stack trace from this process?
>
> (gdb) bt
> #0 0x406b9d80 in __pthread_sigsuspend () from /lib/libpthread.so.0
> #1 0x406b8a7c in __pthread_wait_for_restart_signal () from
> /lib/libpthread.so.0
> #2 0x406b91f8 in pthread_onexit_process () from /lib/libpthread.so.0
> #3 0x40438658 in exit () from /lib/libc.so.6
> #4 0x40438658 in exit () from /lib/libc.so.6
> Previous frame identical to this frame (corrupt stack?)
>
>
>
>>> pgbuild 3489 0.0 0.0 0 0 ? Z 06:15 0:00
>>> [postgres] <defunct>
>> This is a bit odd ... if that process is a direct child of the
>> postmaster it should have been reaped promptly. Could it be a child
>> of the other backend? If so, why was it started? Please try the
>> ps again with whatever switch it needs to list parent process ID.
>
> looks you are right - the defunct 3489 seems to be a child of 3488:
>
> PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
> 1 3389 18341 18341 ? -1 S 1001 0:03
> /home/pgbuild/pgbuildfarm/HEAD/inst/bin/postgres -D data
> 3389 3391 3391 3391 ? -1 Ss 1001 0:00 postgres:
> writer process
> 3389 3392 3392 3392 ? -1 Ss 1001 0:00 postgres: stats
> collector process
> 3389 3488 3488 3488 ? -1 Ss 1001 0:01 postgres:
> pgbuild pl_regression [local] idle
> 3488 3489 3488 3488 ? -1 Z 1001 0:00 [postgres]
> <defunct>
FWIW - I removed --with-tcl from quagga's configuration about two weeks
ago and it has not failed(for that reason) again. So the issue most
definitly looks like plptcl related ...
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2007-01-24 18:41:42 | Re: About PostgreSQL certification |
Previous Message | Merlin Moncure | 2007-01-24 18:30:03 | Re: Default permissisons from schemas |